Use non-utf8 CJK locale with XTerm
XTerm is a UTF-8 -only terminal emulator when being compiled with the wide char support and running under a non-utf8 locale (like zh_CN.GB18030), and it invokes an external program, `luit`, to perform the encoding conversion. The good part of such a design is that we only need to care about the Unicode code points. For example, XTerm's default double-click selection is character class based, so the alpha-num and '.', '/' are not grouped together, but typically, we want an escape-safe shell argument to be selected. So I add the following to ~/.Xresources
XTerm*charClass: 43-47:48,58:48,64:48
to assign assign all (except '=', which I don't want it in practice) of the non-metachar in csh to the class ALNUM. However, any wide characters are also safe w/o escaping. Since we only work with the code points, a simple patch will do the trick, https://gist.github.com/2872752 , by removing the sub-classifying on the Chinese/Japanese characters.
And, obviously, a font using ISO-10646 mapping always works:
XTerm*font: -*-lucidatypewriter-medium-*-*-*-18-*-*-*-*-*-iso10646-* XTerm*boldFont: -*-lucidatypewriter-bold-*-*-*-18-*-*-*-*-*-iso10646-* XTerm*wideFont: -*-wenquanyi bitmap song-bold-*-*-*-17-*-*-*-*-*-*-* XTerm*forcePackedFont: False
Note that the 'wenquanyi bitmap song' font has no bold version now; this one is generated by gbdfed and only supplies 12pt/100dpi/iso10646.
The -iso10646- tags on the normal fonts enable the Unicode line-drawing charactors[1]. However, the bad part of the 'external conversion' design comes: luit is not just a wrapper to iconv; it's an ISO-2022[2] interpreter at the same time, and ISO-2022 happens to conflict with vt102's alternate charset sequences, \E-(0 and \E(B (don't ask me why ncurses' NCURSES_NO_UTF8_ACS option, which is already set by luit, does not work here -- see the title). And XTerm's vt102 parser does not allow some non-standard escape sequences, like ^N/^O, to enable/disable the line-drawing characters.
So we need to force XTerm to recognize a customized escape sequences pair, like
# ~/.termcap # This requires a hack to the XTerm code, to make it accept DLE as ESC. xterm-iso2022|XTerm CJK:\ :eA@:as=^P(0:ae=^P(B:tc=xterm:
while using a customized
XTerm.termName: xterm-iso2022
Fortunately, XTerm's textbook-style, state transition table based vt102 parser is easy to hack: https://gist.github.com/2892430 .
Special note to the FreeBSD users: If you use sudo, you have to link your .termcap to /root/ -- the ncurses on FreeBSD reads the termcap info from the kernel, not /etc/termcap; only the user-supplied ones work.
Links:
[1] Box-drawing character: Unix, CP/M, BBS. https://en.wikipedia.org/wiki/Box-drawing_character#Unix.2C_CP.2FM.2C_BBS
[2] ISO/IEC 2022 character sets. https://en.wikipedia.org/wiki/ISO-2022#ISO.2FIEC_2022_character_sets











