C0
is an invalid start byte for a two-byte UTF-8 sequence, but if a bad UTF-8 decoder accepts it C0 B1
would be interpreted as ASCII 31h (the character 1
).
Quoting Wikipedia:
…(C0 and C1) could only be used for an invalid “overlong encoding” of ASCII characters (i.e., trying to encode a 7-bit ASCII value between 0 and 127 using two bytes instead of one….
1
solved Is this case a weird UTF-8 encoding conversion?