[Solved] Is this case a weird UTF-8 encoding conversion?


C0 is an invalid start byte for a two-byte UTF-8 sequence, but if a bad UTF-8 decoder accepts it C0 B1 would be interpreted as ASCII 31h (the character 1).

Quoting Wikipedia:

…(C0 and C1) could only be used for an invalid “overlong encoding” of ASCII characters (i.e., trying to encode a 7-bit ASCII value between 0 and 127 using two bytes instead of one….

1

solved Is this case a weird UTF-8 encoding conversion?