I’ve seen misconceptions about Unicode crop up regularly in posts discussing it. One very common misconception I’ve seen is that code points have cross-language intrinsic meaning. It usually comes up when people are comparing UTF8 and UTF32. Folks start implying that code points mean something, and that O(1) indexing or slicing at code point boundaries is a useful operation. I’ve also seen this assumption manifest itself in actual programs which make incorrect assumptions about the nature of code points and mess things up when fed non-Latin text.















