This HOWTO has been written to help you setup your Linux box to use UTF-8 encoding for using various Indic scripts. You will have to install the IndiX system developed by NCST, Mumbai on your machine in order for you to use various Indic scripts. I have tested the IndiX system on Exodus GNU/Linux, RedHat Linux, and Mandrake Linux. Anyone who has tested this system on a machine running Debian, please let me know and I will include that in this HOWTO. I want to thank Mr. Keyur Shroff from NCST, Mumbai for allowing me to modify and redistribute his Devanagri-HOWTO.
Please note that Exodus GNU/Linux, developed by the good guys at Centurion Linux, India will ship with the IndiX system installed, thanks to the Transfer of Technology deal signed by NCST, Mumbai and Centurion Linux Pvt. Ltd.
Almost all of the leading GNU/Linux distributions available today have been localized in various international languages like French, German, Spanish, Chinese, Arabic, etc. This HOWTO aims at documenting the steps involved in enabling you to localize your GNU/Linux distribution to Indic scripts of your choice. To begin with, you must be aware of the complexity involved in localizing any of the Indian languages. Any Indian language text input differs from that of English. Perhaps the most significant difference is that in English, each keystroke maps directly onto a letter where each letter has a unique code. On the other hand, a 'syllable' - the Indian language equivalent unit of writing letter is composed of one or more characters entered through the keyboard.
The syllable is composed of vowels, consonants, modifiers and other special graphics signs. These are encoded, just as roman letters are. The user types in a sequence of vowels, consonants, modifiers and the graphics signs. The machine then composes these syllables at run time based on language dependent rules. Every syllable is thus represented in the machine as a unique sequence of vowels, consonants and modifiers. In a text sequence, these characters are stored in logical (phonetic) order.
Indic characters can combine or change shape depending on their context. A character's appearance is affected by its ordering with respect to other characters, the font used to render the character, and the application or system environment. These variables can cause the appearance of Devanagari characters to be different from their nominal glyphs (used in the code charts). Additionally, characters cause a change in the order of the displayed glyphs. This reordering is not commonly seen in non-Indic scripts and occurs independent of any bi-directional character reordering that might be required.
Each syllable has a unique visual representation. However, there are too many syllables to design glyphs for each one individually. So a font normally contains certain component glyphs from which a syllable is composed at run time. The onscreen representation of a syllable is then a composition of glyphs from the Indian language font. There is no direct mapping of glyph codes to the consonant, vowel or modifier codes. However, for every syllable (a sequence of consonants, vowels and modifiers) there is a corresponding sequence of glyphs. This constitutes a many-to-many mapping from keystrokes to glyphs as opposed to a simplistic one-to-one mapping in roman scripts.
The Indix system developed by NCST, Mumbai enables most applications in X Windows (irrespective of the toolkit used), to render Indic characters according to the unicode standard specification. IndiX provides support for OpenType fonts and Unicode encoding at X Windows level. This enables most of the existing applications to handle Indic scripts without any modification or recompilation.
Once you have installed the IndiX system, following all the steps mentioned in this HOWTO, you will be able to fly across seven seas and slap that annoying sailor who keeps goin' hic' hic'... Okay, on a more serious note, you will be able to enjoy your Linux experience in Devanagri and other Indic scripts of your choice.