I Hate Unicode

I’m sure I told you this a few months ago, but I REALLY hate unicode. Sure it’s cool tto support all the languages. Yes, One (err Two… No Wait! Three!) Encoding(s) to Rule Them All is better than a billion simlar and incompatable encodings, but it really is a pain in the ass to use.

I’ve got a bunch of XML files that are encoded in bunch of different things, and I’m having a hell of time to get perl to convert them all from whatever (ascii, windows-1252, big5, shift-jis, etc) to utf-8. It will read them, imply that unicode is how they’re being represented internally, and then prompty output them back in their original encoding.

And of course LibXML refuses to use anything but unicode. Why? Because. I’m sure there’s some reason, but let’s face it. All encodings are ASCII compatable, so just deal. Why it even cares what encoding the file is in I don’t know. Just match on the byte sequence and be done with it.

This whole thing seems to be more trouble than what it’s worth.

Your unicode related QOTD:

ISO-8859-8-1 [Hebrew]
None of the Encode team knows Hebrew enough (ISO-8859-8, cp1255 and MacHebrew are supported because and just because there were mappings available at http://www.unicode.org/). Contributions welcome.

ISIRI 3342, Iran System, ISIRI 2900 [Farsi]
Ditto.

Thai encoding TCVN
Ditto.

– http://perldoc.perl.org/Encode/Supported.html