Unicode

Unicode
	Logo of the Unicode Consortium
Alias(es)	Universal Coded Character Set (UCS); ISO/IEC 10646;
Language(s)	168 scripts (list)
Standard	Unicode Standard
Encoding formats	UTF-8; UTF-16; GB18030; UTF-32; BOCU; SCSU; UTF-EBCDIC; (uncommon) UTF-7; UTF-1; (obsolete)
Preceded by	ISO/IEC 8859, among others
	Official website; Technical website;

This article contains uncommon Unicode characters. Without proper rendering support, you may see question marks, boxes, or other symbols.

Unicode (also known as The Unicode Standard and TUS^[1]^[2]) is a character encoding standard maintained by the Unicode Consortium designed to support the use of text in all of the world's writing systems that can be digitized. Version 16.0^[A] defines 154,998 characters and 168 scripts^[3] used in various ordinary, literary, academic, and technical contexts.

Unicode has largely supplanted the previous environment of a myriad of incompatible character sets used within different locales and on different computer architectures. The entire repertoire of these sets, plus many additional characters, were merged into the single Unicode set. Unicode is used to encode the vast majority of text on the Internet, including most web pages, and relevant Unicode support has become a common consideration in contemporary software development. Unicode is ultimately capable of encoding more than 1.1 million characters.

The Unicode character repertoire is synchronized with ISO/IEC 10646, each being code-for-code identical with one another. However, The Unicode Standard is more than just a repertoire within which characters are assigned. To aid developers and designers, the standard also provides charts and reference data, as well as annexes explaining concepts germane to various scripts, providing guidance for their implementation. Topics covered by these annexes include character normalization, character composition and decomposition, collation, and directionality.^[4]

Unicode encodes 3,790 emoji, with the continued development thereof conducted by the Consortium as a part of the standard.^[5] The widespread adoption of Unicode was in large part responsible for the initial popularization of emoji outside of Japan.^{[citation needed]}

Unicode text is processed and stored as binary data using one of several encodings, which define how to translate the standard's abstracted codes for characters into sequences of bytes. The Unicode Standard itself defines three encodings: UTF-8, UTF-16,^[a] and UTF-32, though several others exist. UTF-8 is the most widely used by a large margin, in part due to its backwards-compatibility with ASCII.

^ "Unicode Technical Report #28: Unicode 3.2". Unicode Consortium. 2002-03-27. Retrieved 2022-06-23.
^ Jenkins, John H. (2021-08-26). "Unicode Standard Annex #45: U-source Ideographs". Unicode Consortium. §2.2 The Source Field. Retrieved 2022-06-23.
^
- "Unicode Character Count V16.0". The Unicode Consortium. 2024-09-10.
- "Unicode 16.0 Versioned Charts Index". The Unicode Consortium. 2024-09-10.
- "Supported Scripts". The Unicode Consortium. 2024-09-10. Retrieved 2024-09-11.
^ "The Unicode Standard: A Technical Introduction". 2019-08-22. Retrieved 2024-09-11.
^ "Emoji Counts, v16.0". The Unicode Consortium. Retrieved 2024-09-10.

Cite error: There are <ref group=upper-alpha> tags or {{efn-ua}} templates on this page, but the references will not show without a {{reflist|group=upper-alpha}} template or {{notelist-ua}} template (see the help page).
Cite error: There are <ref group=lower-alpha> tags or {{efn}} templates on this page, but the references will not show without a {{reflist|group=lower-alpha}} template or {{notelist}} template (see the help page).

[1] "Unicode Technical Report #28: Unicode 3.2". Unicode Consortium. 2002-03-27. Retrieved 2022-06-23.

[2] Jenkins, John H. (2021-08-26). "Unicode Standard Annex #45: U-source Ideographs". Unicode Consortium. §2.2 The Source Field. Retrieved 2022-06-23.

[4] 
"Unicode Character Count V16.0". The Unicode Consortium. 2024-09-10.
"Unicode 16.0 Versioned Charts Index". The Unicode Consortium. 2024-09-10.
"Supported Scripts". The Unicode Consortium. 2024-09-10. Retrieved 2024-09-11.

[4] "Unicode Character Count V16.0". The Unicode Consortium. 2024-09-10.

[5] "Unicode 16.0 Versioned Charts Index". The Unicode Consortium. 2024-09-10.

[6] "Supported Scripts". The Unicode Consortium. 2024-09-10. Retrieved 2024-09-11.

[5] "The Unicode Standard: A Technical Introduction". 2019-08-22. Retrieved 2024-09-11.

[6] "Emoji Counts, v16.0". The Unicode Consortium. Retrieved 2024-09-10.

[1]

[2]

[A]

[3]

[4]

[5]

[a]

Logo of the Unicode Consortium
Alias(es)	Universal Coded Character Set (UCS) ISO/IEC 10646
Language(s)	168 scripts (list)
Standard	Unicode Standard
Encoding formats	UTF-8 UTF-16 GB18030 UTF-32 BOCU SCSU UTF-EBCDIC (uncommon) UTF-7 UTF-1 (obsolete)
Preceded by	ISO/IEC 8859, among others
Official website Technical website

Unicode

From Wikipedia, the free encyclopedia · View on Wikipedia