दे

Devanagari देवनागरी

The shared base script behind Hindi, Marathi, Nepali and Sanskrit.

Where it's used

Use --lang devanagari for mixed or non-Hindi Devanagari material — Marathi, Nepali, Sanskrit. It inherits the Hindi rules and adds script-general ones.

What Tesseract gets wrong

Same structural matra rules

Every Devanagari language shares the consonant + matra structure, so the same orphaned-matra checks apply.

ाक→flagged

Stray Vedic accents

Sanskrit scans pick up spurious udatta/anudatta accent marks from speckle.

Avagraha noise

The avagraha (ऽ) is often a misread danda or mark.

How gurmukhifix fixes it

Inherits the full Hindi rule set via config extends.
Strips spurious Vedic accent marks and stray avagraha.
Applies the Devanagari matra-validity checks.
Normalises to NFC.

Examples

Raw OCR	gurmukhifix	What happened
`भारत देश`	`भारत देश`	Already correct — untouched
`मराठी भाषा`	`मराठी भाषा`	Valid Marathi — passes through
`ाक`	`ाक ⚑`	Vowel sign at word start — flagged

Why this beats the alternatives

vs. Tesseract alone

Tesseract turns pixels into characters. It has no linguistic knowledge — it can't know that a dependent vowel may not begin a word, or that a sign written to the left of a letter must be encoded after it. gurmukhifix adds exactly those rules.

vs. find-and-replace / spellcheck

A blind substitution table rewrites correct letters too and corrupts good text. gurmukhifix is evidence-gated: a fix is applied only when it makes the text more valid, so already-correct Unicode is never changed.

vs. doing nothing

Raw OCR often looks right but is malformed Unicode — wrong code-point order, dropped marks. That silently breaks search, indexing, fonts and copy-paste. gurmukhifix produces canonical, well-formed text.

Try Devanagari in the playground →