UTF-8

UTF-8 is a standard for encoding characters with variable length data. If one would like to encode all characters uniformly, then at least 4 bytes would have to be used for each character. To save on data volume, this standard uses 1, 2, 3 or 4 bytes depending on the character. Special bit indicators tell the interpreting system with how many bytes the character was encoded.