I can search a CJK char (such as 小) by using a unicode code point:
/\%u5c0f
/[\u5c0f]
I cannot search all of CJK chars by using [\u4E00-\u9FFF], because vim manual says:
:help /[]
NOTE: The other backslash codes mentioned above do not work inside []!
Is these a way to do the job?
Vim cannot actually do this by itself, since you aren’t given access to Unicode properties like
\p{Han}.As of Unicode v6.0, the range of codepoints for characters in the Han script is:
Whereas with Unicode v6.1, the range of Han codepoints has changed to:
I also seem to recall that Vim has difficulties expressing astral code points, which are needed for this to work correctly. For example, using the flexible
\x{HHHHHH}notation from Java 7 or Perl, you would have:Notice that the last part of the range is
\x{2F800}-\x{2FA1D}, which is beyond the BMP. But what you really need is\p{Han}(meaning,\p{Script=Han}). This again shows that regex dialects that don’t support at least Level 1 of UTS#18: Basic Unicode Support are inadequate for working with Unicode. Vim’s regexes are inadequate for basic Unicode work.EDITED TO ADD
Here’s the program that dumps out the ranges of code points that apply to any given Unicode script.
Its answers depend on which version of Perl — and thus, which version of Unicode — you’re running it against.
You can use the
corelist -a Unicodecommand to see which version of Unicode goes with which version of Perl. Here is selected output: