← All scripts
हि

Hindi हिन्दी

Hindi written in the Devanagari script.

Where it's used

Hindi is written in Devanagari and appears across north-Indian printed and handwritten records. gurmukhifix focuses on matra (vowel-sign) attachment and nasalisation.

What Tesseract gets wrong

Matra with no consonant

A dependent vowel (matra) must attach to a consonant. A matra at the start of a word, or two in a row, is structurally impossible and signals an OCR error.

ाकमलflagged

Anusvara vs chandrabindu

The nasalisation marks ं (anusvara) and ँ (chandrabindu) are easily confused.

Sibilant ambiguity

श, ष and स look similar in many hands and are routinely swapped.

How gurmukhifix fixes it

Examples

Raw OCRgurmukhifixWhat happened
नमस्ते दुनियानमस्ते दुनियाAlready correct — untouched
हिंदी भाषाहिंदी भाषाValid — passes through
ाकमल देशाकमल देश ⚑Vowel sign at word start — flagged for review

Why this beats the alternatives

vs. Tesseract alone

Tesseract turns pixels into characters. It has no linguistic knowledge — it can't know that a dependent vowel may not begin a word, or that a sign written to the left of a letter must be encoded after it. gurmukhifix adds exactly those rules.

vs. find-and-replace / spellcheck

A blind substitution table rewrites correct letters too and corrupts good text. gurmukhifix is evidence-gated: a fix is applied only when it makes the text more valid, so already-correct Unicode is never changed.

vs. doing nothing

Raw OCR often looks right but is malformed Unicode — wrong code-point order, dropped marks. That silently breaks search, indexing, fonts and copy-paste. gurmukhifix produces canonical, well-formed text.

Try Hindi in the playground →