Character encoding

Punched tape with the word "Wikipedia" encoded in ASCII. Presence and absence of a hole represents 1 and 0, respectively; for example, "W" is encoded as "1010111".

Character encoding is the process of assigning numbers to graphical characters, especially the written characters of human language, allowing them to be stored, transmitted, and transformed using digital computers.^[1] The numerical values that make up a character encoding are known as "code points" and collectively comprise a "code space", a "code page", or a "character map".

Early character codes associated with the optical or electrical telegraph could only represent a subset of the characters used in written languages, sometimes restricted to upper case letters, numerals and some punctuation only. The low cost of digital representation of data in modern computer systems allows more elaborate character codes (such as Unicode) which represent most of the characters used in many written languages. Character encoding using internationally accepted standards permits worldwide interchange of text in electronic form.

The most used character encoding on the web is UTF-8, used in 98.2% of surveyed web sites, as of May 2024.^[2] In application programs and operating system tasks, both UTF-8 and UTF-16 are popular options.^[3]^[4]

^ "Character Encoding Definition". The Tech Terms Dictionary. 24 September 2010.
^ "Usage Survey of Character Encodings broken down by Ranking". W3Techs. Retrieved 29 April 2024.
^ "Charset". Android Developers. Retrieved 2 January 2021. Android note: The Android platform default is always UTF-8.
^ Galloway, Matt (9 October 2012). "Character encoding for iOS developers. Or UTF-8 what now?". www.galloway.me.uk. Retrieved 2 January 2021. in reality, you usually just assume UTF-8 since that is by far the most common encoding.

[1] "Character Encoding Definition". The Tech Terms Dictionary. 24 September 2010.

[W3TechsWebEncoding-2] "Usage Survey of Character Encodings broken down by Ranking". W3Techs. Retrieved 29 April 2024.

[:0-3] "Charset". Android Developers. Retrieved 2 January 2021. Android note: The Android platform default is always UTF-8.

[:1-4] Galloway, Matt (9 October 2012). "Character encoding for iOS developers. Or UTF-8 what now?". www.galloway.me.uk. Retrieved 2 January 2021. in reality, you usually just assume UTF-8 since that is by far the most common encoding.

[1]

[2]

[3]

[4]

Character encoding

From Wikipedia, the free encyclopedia · View on Wikipedia