You have what is known as a double encoding.
You have the three character sequence “你好吗” which you correctly point out is encoded in UTF-8 as E4BDA0 E5A5BD E59097.
But now, start encoding each byte of THAT encoding in UTF-8. Start with E4. What is that codepoint in UTF-8? Try it! It’s C3 A4!
You get the idea…. 🙂
Here is a Java app which illustrates this:
public class DoubleEncoding {
public static void main(String[] args) throws Exception {
byte[] encoding1 = "你好吗".getBytes("UTF-8");
String string1 = new String(encoding1, "ISO8859-1");
for (byte b : encoding1) {
System.out.printf("%2x ", b);
}
System.out.println();
byte[] encoding2 = string1.getBytes("UTF-8");
for (byte b : encoding2) {
System.out.printf("%2x ", b);
}
System.out.println();
}
}
1
solved unable to show image characters in java class