What is Unicode?

Unicode® is a worldwide character set standard designed to allow the global interchange of multilingual digital information. The inventors of Unicode had the goal of supporting all the world’s scripts while accommodating existing national and international character sets. The Unicode Standard has been endorsed by all major hardware and software companies in addition to the International Organization for Standardization® (ISO). In fact, the Unicode Standard and the international standard ISO 10646 have been in tandem for several years. Since both standards support the same character repertoire, companies can confidently embrace Unicode without worrying about competing standards.

What is Unicode Conformance?

By definition, a process is considered “Unicode conformant” if it can correctly interpret and render a subset of Unicode without misinterpreting or disturbing all other subsets. Claims of Unicode conformance in a font can be misleading when there is no common understanding about the number of scripts and/or languages supported. Unicode conformance in a font indicates correctness of interpretation, but not necessarily breadth of coverage. For example, software which correctly interprets only the Devanagari subset of Unicode by displaying the results in an appropriate font is Unicode conformant. On the other hand, a system which displays any random 256-character subset of Unicode through the same Latin-1 font would not conform to the Unicode standard.

Characters or glyphs?

Unicode is a character set for the basic interchange of plain text. It contains no attributes regarding language, display format, color, typeface, or any other details about rendering. In this respect, Unicode-encoded text is analogous to ASCII-encoded text. Unicode makes the important distinction between characters and glyphs. For example, if a text contains the sequence of characters n, a, i, ¨, v, e, one would expect the word ‘naïve’ to be rendered. In this case, the character n is represented by the glyph n. But more importantly, the single glyph ï was represented by two successive characters in the text. In order to avoid duplication of characters, Unicode encodes text by script, not by language. For instance, the Latin A is used without distinction for text in Catalan, English, Indonesian, Swedish and Swahili. The fact that different languages and cultures may prefer differing display forms for particular letters is relegated to the rendering process which may have further information about style, language, locale, and other attributes. To illustrate this important design principle, consider, for example, that the Hebrew script is used for Modern Hebrew, Biblical Hebrew, Ladino, and Yiddish; or, that Arabic script is used for Arabic, Farsi, Urdu, Kurdish, and Ottoman Turkish. Any given script can represent related and unrelated languages, living and dead languages alike. With such diversity, differences in glyph style are bound to be numerous. Unicode assumes that those differences will be handled by the rendering software, not the underlying Unicode character code.

Monotype Imaging Non-Latin Modular Solutions

Font Foundry Monotype Imaging® has produced one of the worlds most comprehensive multilingual font libraries. The non-Latin font collection allows Unicode fonts to be grouped into useful modules that cover the customer’s required language support. For example, suppose that a customer needs to publish a report in the major languages of Northern India. By gathering the main scripts of that region (Bengali, Devanagari, Gujarati, Gurmukhi) into a font module, the customer can be assured access to all the necessary languages: Assamese, Bengali, Bihari, Gujarati, Hindi, Kashmiri, Marathi and Panjabi. If another customer required support for all Slavic languages in one simple package then a font module could be composed that includes Cyrillic and Latin scripts (East European subset). With this module, one could write in all of the following languages: Belorussian, Bulgarian, Croatian, Czech, Macedonian, Polish, Russian, Serbian, Slovak, Slovene, Sorbian and Ukrainian.

