From MSDN: "Unicode is a 16-bit, fixed-width character encoding standard that
encompasses virtually all of the characters commonly used on computers today.
This includes most of the world's written languages, plus publishing characters,
mathematical and technical symbols, and punctuation marks."
From Unicode.org: "Computers ... store letters and other characters by
assigning a number for each one. Before Unicode was invented, there were
hundreds of different encoding systems for assigning these numbers.
No single encoding could contain enough characters...
Unicode provides a unique number for every character,
no matter what the platform, no matter what the program, no matter what the language."
For example, the basic Latin letter "A" has the code Hex 0041 (65), the Russian
letter has the code Hex 0416 (1046), and the Chinese character
has the code Hex 32A5 (12965).
UTF-8 (Unicode Transformation Format, 8-bit encoding form) is the recommended
format to be used to send Unicode-based data across networks, in particular the Internet.
UTF-8 represents a Unicode value as a sequence of 1, 2, or 3 bytes.
Unicode characters in the range Hex 0000 to 007F are encoded simply as bytes
00 to 7F. This means that files and strings which contain only 7-bit ASCII
characters have the same encoding under both ASCII and UTF-8.
Therefore, the Unicode 0041 ("A") in UTF-8 is Hex 41.
Unicode characters in the range Hex 0080 to 07FF are encoded as a sequence of two bytes
For example, the Unicode 0416 ()
is encoded as Hex D0 96. Unicode characters in the range Hex 0800 to FFFF are encoded
as a sequence of three bytes. For example the Unicode 32A5 ()
is encoded as Hex E3 8A A5.