Arc Forumnew | comments | leaders | submitlogin
1 point by almkglor 6072 days ago | link | parent

> When you talk about a character represented by several code points, are you talking about Unicode surrogates for characters > 65536?

Actually I'm talking about so-called "combining characters" http://en.wikipedia.org/wiki/Combining_character

Normalization... hahahaha unicode unicode headaches headaches! http://en.wikipedia.org/wiki/Unicode_normalization



4 points by kens2 6072 days ago | link

Oh, Unicode combining characters and normalization. I classify that as "somebody else's problem." Specifically, if you're writing a font rendering engine, it's your problem. If you're writing an Arc compiler, it's not your problem. If you want complete Unicode library support in your language (like MzScheme's normalization functions string-normalize-nfd, etc.), then you just use an existing library such as ICU, and it's not your problem. ICU: http://www-306.ibm.com/software/globalization/icu/index.jsp

-----