Character encoding is the process of assigning numbers to graphical characters, especially the written characters of human language, allowing them to be stored, transmitted, and transformed using computers.[1] The numerical values that make up a character encoding are known as code points and collectively comprise a code space, a code page, or character map.
Early character encodings that originated with optical or electrical telegraphy and in early computers could only represent a subset of the characters used in written languages, sometimes restricted to upper case letters, numerals and some punctuation only. Over time, character encodings capable of representing more characters were created, such as ASCII, the ISO/IEC 8859 encodings, various computer vendor encodings, and Unicode encodings such as UTF-8 and UTF-16.
The most popular character encoding on the World Wide Web is UTF-8, which is used in 98.2% of surveyed web sites, as of May 2024.[2] In application programs and operating system tasks, both UTF-8 and UTF-16 are popular options.[3]
Android note: The Android platform default is always UTF-8.