How Sindhi Unicode Works

Most people assume Arabic-script keyboards all pull from the same character pool. Sindhi doesn't. Here's exactly which code points it uses — and why it matters when you're typing or publishing.

When you type a Sindhi letter on this keyboard and paste it into WhatsApp or a Word document, something specific happens under the hood: your device stores a number, not a picture of a letter. That number is a Unicode code point — a permanent, globally unique identifier assigned to every character in every writing system on Earth. For most Sindhi letters, that number lives in one of two places in the Unicode Standard, and knowing which one matters more than most people realise.

Two Blocks, One Script

The Unicode Standard organises characters into named blocks. Arabic-script characters live primarily in two blocks relevant to Sindhi:

  • Arabic block — U+0600 through U+06FF (512 code points). This is where the 28 core Arabic letters live, along with the extensions used by Urdu, Persian, Pashto, and Sindhi.
  • Arabic Supplement block — U+0750 through U+077F (48 code points). Added in Unicode 4.1 specifically to cover underrepresented Arabic-script languages — Sindhi among the most prominent.

Sindhi uses characters from both. The letters it shares with Arabic and Urdu — ا ب ت ث ج ح خ د ذ ر ز س ش ص ض ط ظ ع غ ف ق ک ل م ن و ه ي — are in the Arabic block. The letters unique to Sindhi, particularly the aspirated and retroflex series, are scattered across the Arabic block's extended range and into the Arabic Supplement.

The Sindhi-Specific Code Points

Here are the letters that appear in Sindhi writing but not in standard Urdu or Arabic, along with their Unicode code points:

Letter Code Point Unicode Name Phonetic Value
ٻU+067BARABIC LETTER BBEHImplosive /ɓ/
ٿU+067FARABIC LETTER TTHAAspirated retroflex /ʈʰ/
ڃU+0683ARABIC LETTER NYEHAspirated palatal /dʒʰ/
ڄU+0684ARABIC LETTER DYEHImplosive palatal /ɗ̠/
ڀU+0680ARABIC LETTER BHEHAspirated bilabial /bʰ/
ڌU+068CARABIC LETTER DDAHALRetroflex voiced /ɖ/
ڍU+068DARABIC LETTER DAHALAspirated retroflex /ɖʰ/
ڊU+068AARABIC LETTER DA with dot belowRetroflex /ɖ/
ڻU+06BBARABIC LETTER RNOONRetroflex nasal /ɳ/
ڙU+0699ARABIC LETTER REH with four dotsRetroflex /ɽ/
ڪU+06AAARABIC LETTER KEHEH/k/ (Sindhi form)
ڦU+06A6ARABIC LETTER PEHEHAspirated /pʰ/
ٺU+067AARABIC LETTER TTEHEHRetroflex /ʈ/
ٽU+067DARABIC LETTER TEHEHRetroflex aspirated variant
Try it: Open the Sindhi keyboard, type the letter ٻ, then copy it and paste into a Unicode inspector (such as unicode.org/cldr). You will see U+067B — confirming you have the correct implosive b, not the Urdu ب at U+0628.

Why Substituting Urdu Characters Breaks Things

This is not an abstract typographic concern. Consider the Sindhi word for "father," which uses the implosive ٻ. If a writer uses standard ب instead — a common workaround when a proper Sindhi keyboard is not available — the stored text contains U+0628 where it should contain U+067B. Visually, the difference may be imperceptible in some fonts. But:

  • Search breaks: A person searching for the Sindhi word "ٻاپ" finds documents containing "باپ" instead — a different word entirely. Search indices treat these as distinct strings.
  • Font rendering diverges: Fonts designed for Sindhi (like Jameel Noori Nastaleeq) render U+067B with the correct visual form for Sindhi typography. Urdu fonts may render it identically to ب, erasing the phonemic distinction.
  • Screen readers misread: Accessibility software that has language-specific pronunciation rules for Arabic-script languages will read U+067B differently from U+0628. Substitution produces incorrect pronunciation in text-to-speech output.
  • Archival integrity suffers: A document stored with incorrect code points may display correctly today — in the font currently installed on your machine — but become ambiguous or incorrect when opened in a future application or on a device with different fonts.

Sindhi Numerals and Punctuation

Sindhi uses Eastern Arabic-Indic numerals (۰ ۱ ۲ ۳ ۴ ۵ ۶ ۷ ۸ ۹) at code points U+06F0 through U+06F9 — the same code points used in Urdu. This is one area where the two languages share characters completely, since the numeral forms are identical.

Sindhi punctuation includes the Arabic comma (،, U+060C), the Arabic full stop (۔, U+06D4), the Arabic semicolon (؛, U+061B), and the Arabic question mark (؟, U+061F). It also uses ۽ (U+06FD), a Sindhi-specific conjunction mark that has no exact Urdu equivalent.

How Unicode Characters Become Visible Letters

A common confusion: Unicode stores an abstract number, not a picture. The actual shape you see on screen is determined by the font your application uses to render that number. This means the same Sindhi text can look quite different depending on whether it is rendered in Jameel Noori Nastaleeq, Noto Nastaliq Urdu, Mehr Nastaleeq, or a generic Arabic font like Arial Unicode MS.

Nastaliq rendering — the flowing diagonal style associated with Urdu and Sindhi calligraphy — is computationally complex. It requires glyph substitution rules (OpenType GSUB tables) that account for hundreds of contextual forms. Not all fonts handle all Sindhi code points correctly. A font that correctly renders every Urdu character may still display some Sindhi-specific letters as empty boxes or in an incorrect form.

For production use — publishing, printing, archiving — the most reliable fonts for Sindhi currently are Jameel Noori Nastaleeq (widely available on Windows and for download elsewhere) and Noto Nastaliq Urdu (which covers most Sindhi code points and is freely distributed by Google). Both handle the Arabic Supplement block characters that are specific to Sindhi.

The Practical Upshot

If you use the Sindhi keyboard on this site, all of this happens correctly without any action on your part. Every key maps to the verified Unicode code point for the intended Sindhi character. The output is standard, portable, and compatible with any Unicode-aware application.

If you are building a Sindhi text application, integrating a Sindhi input method, or importing Sindhi text from older sources, verifying code points is worth the time. The Unicode Character Database at unicode.org is the authoritative reference — not any single font or keyboard application's interpretation of it.

Understanding where Sindhi lives in Unicode also helps contextualise why the script differences between Sindhi and Urdu are not just a matter of aesthetics but of technical correctness — which is the subject of the next article in this series.

Written by

Ayaz

Digital publisher and language technology enthusiast. Builds web tools for South Asian language communities with a focus on Unicode correctness and practical accessibility. Runs several Sindhi and Pakistani language content sites.