what is tashkeel? a guide to arabic diacritics
Open any Arabic news site — Al Jazeera, BBC Arabic, Al Arabiya — and you'll notice the text has no dots above or below the letters beyond the ones that are part of the base letterforms. No small diagonal strokes, no tiny circles, no miniature letter marks. That's Arabic without tashkeel. Now open a Quran app, and suddenly every letter is decorated with precise marks that tell you exactly how to pronounce it. That's Arabic with tashkeel.
what tashkeel actually is
Tashkeel (تشكيل, literally "shaping") is the system of diacritical marks placed above or below Arabic letters to indicate vowels and pronunciation. Written Arabic doesn't normally include short vowels — the letters represent consonants and long vowels only. Tashkeel fills in the missing vowel sounds.
The core marks are:
- Fatha (فتحة) — َ — a small diagonal stroke above the letter. Indicates a short "a" sound. The letter ب with fatha (بَ) is pronounced "ba."
- Kasra (كسرة) — ِ — a small diagonal stroke below the letter. Indicates a short "i" sound. بِ is "bi."
- Damma (ضمة) — ُ — a small waw-like mark above the letter. Indicates a short "u" sound. بُ is "bu."
- Sukun (سكون) — ْ — a small circle above the letter. Indicates the absence of a vowel — the consonant is "closed." بْ is just "b" with no following vowel.
- Shadda (شدة) — ّ — a small w-shaped mark above the letter. Indicates the consonant is doubled (geminated). بّ is "bb."
- Tanween — ً ٍ ٌ — doubled versions of fatha, kasra, and damma, appearing at the end of words. They indicate an "n" sound added to the vowel: "an," "in," "un." This is the grammatical case ending system of formal Arabic.
when tashkeel matters
The Quran: Always fully vowelized. Mispronouncing a word in recitation can change its meaning, so every letter carries its exact diacritical marks. This is the one context where tashkeel is considered mandatory.
Children's books and Arabic-language learning materials: Textbooks for Arabic learners (both native children and foreign students) include full tashkeel to teach correct pronunciation. As readers become fluent, they gradually read texts without it.
Ambiguous words: Arabic has many words that are spelled identically but pronounced differently with different meanings. The word عَلِمَ (ʿalima, "he knew") and عَلَّمَ (ʿallama, "he taught") differ only in their tashkeel. In formal writing, tashkeel is added selectively to disambiguate such words.
Poetry: Classical Arabic poetry relies on precise meter (بحور), which requires knowing the exact vowel pattern. Published poetry collections typically include tashkeel.
when tashkeel doesn't matter
In everyday writing — news articles, social media, emails, business correspondence, text messages — Arabic is written without tashkeel. Native readers fill in the vowels from context, the same way English readers can understand "pls snd me the rprt" without full vowels. For a fluent reader, context resolves ambiguity in almost every case.
Adding tashkeel to casual text is considered unusual and even slightly condescending — as if the writer assumes the reader can't parse standard Arabic. The exception is social media posts quoting Quran or hadith, where tashkeel is expected and respected.
how automatic tashkeel tools work
Diacritization tools (like the one bababa is building) use either rule-based systems or machine learning models to predict the correct diacritical marks for each letter in a given text. The challenge is that Arabic is highly ambiguous without context — a three-letter root like ك-ت-ب can be vocalized as kataba (he wrote), kutiba (it was written), kutub (books), kuttāb (writers), or several other forms.
Modern ML-based diacritizers achieve about 95–97% accuracy on formal text. They struggle with names, loanwords, informal Arabic, and words that are genuinely ambiguous without broader context. Rule-based systems like Mishkal are faster but slightly less accurate.
tashkeel tool (coming soon)
bababa's tashkeel tool will add diacritics to unvoweled arabic text automatically. powered by open-source technology, running through a private backend. coming in a future update.
back to bababa →