[Voiceglue] More on UTF-8

emiliano esposito emiespo at tiscali.it
Sat Dec 13 06:08:02 EST 2008


I've tried with a very simple prompt:

<prompt>Try this: è</prompt>

That letter becomes 0xC3 0xA8 in UTF-8, but the error log says:

11:58:36:1229165 CRIT VOICEGLU :2: parser error : Input is not proper 
UTF-8, indicate encoding !
Bytes: 0xC3 0x30 0x32 0x78
Try this: �02x�02x
^

(the same stuff is reported elsewhere).

Notice that "0xC3 0x30 0x32 0x78" sequence. It's not valid UTF-8! After 
C3, there must be a byte like this: 0x10yyyyyy. Instead, only the first 
C3 is correct, then we have 02x in plain ASCII, that was NOT part of the 
original string.

Why does this happen? Is it a OpenVXI flaw or something else?



More information about the Voiceglue mailing list