Unicode

Unicode
New Unicode logo.svg
Logo of the Unicode Consortium
Alias(es)Universal Coded Character Set (UCS, ISO/IEC 10646)
Language(s)International
StandardUnicode Standard
Encoding formats
Preceded byISO/IEC 8859, various others

Unicode, formally The Unicode Standard,[note 1][note 2] is an information technology standard for the consistent encoding, representation, and handling of text expressed in most of the world's writing systems. The standard, which is maintained by the Unicode Consortium, defines as of the current version (15.0) 149,186 characters[3][4] covering 161 modern and historic scripts, as well as symbols, thousands of emoji[5] (including in colors), and non-visual control and formatting codes.

Unicode's success at unifying character sets has led to its widespread and predominant use in the internationalization and localization of computer software. The standard has been implemented in many recent technologies, including modern operating systems, XML, JSON, and most modern programming languages, sometimes only in UTF-8 form.

The Unicode character repertoire is synchronized with ISO/IEC 10646, each being code-for-code identical to the other. The Unicode Standard, however, includes more than just the base code. Alongside the character encodings, the Consortium's official publication includes a wide variety of details about the scripts and how to display them: normalization rules, decomposition, collation, rendering, and bidirectional text display order for multilingual texts, and so on.[6] The Standard also includes reference data files and visual charts to help developers and designers correctly implement the repertoire.

Unicode can be stored using several different encodings, which translate the character codes into sequences of bytes. The Unicode Standard defines three encodings but several others exist, mostly variable-length encodings. The most common encodings are the ASCII-compatible UTF-8, the ASCII-incompatible UTF-16 (compatible with the obsolete UCS-2), and the Chinese Unicode encoding standard GB18030 which is not part of The Unicode Standard but is used in China and implements Unicode fully.


Cite error: There are <ref group=note> tags on this page, but the references will not show without a {{reflist|group=note}} template (see the help page).

  1. ^ "Unicode Technical Report #28: Unicode 3.2". Unicode Consortium. 2002-03-27. Retrieved 2022-06-23.
  2. ^ Jenkins, John H. (2021-08-26). "Unicode Standard Annex #45: U-source Ideographs". Unicode Consortium. Retrieved 2022-06-23. 2.2 The Source Field
  3. ^ "Unicode 15.0.0". www.unicode.org.
  4. ^ "Unicode Character Count V15.0". www.unicode.org.
  5. ^ "Emoji Counts, v15.0". unicode.org. Retrieved 2023-01-30.
  6. ^ "The Unicode Standard: A Technical Introduction". Retrieved 2010-03-16.

From Wikipedia, the free encyclopedia · View on Wikipedia

Developed by Nelliwinne