What’s the correct way to write Unicode-aware one-liners in Perl? The obvious way:
$ echo 'フーバー' | perl -lne 'print if /フ/'
フーバー
…kinda appears to work on first sight, but this is just an accident: the Unicode is interpreted as bytes as the next example shows:
$ echo 'フーバー != フウバー' | perl -mString::Diff=diff -lne 'print join(" ", diff($1, $2)) if /(.*)!=(.*)/' => 29
フ?[??]バー[ ] { }フ?{??}バー
Just using the -C flag to set the STDIN/STDOUT etc. to UTF‑8 is not enough by itself:
$ echo 'フーバー' | perl -C -lne 'print if /フ/'
[no output]
…because now the text in -e is not interpreted as Unicode.
So is this the way to go (assuming a sane LOCALE — that is, one in the form "*.UTF‑8") like this:
$ perl -C -Mutf8 [...]
Yes, loading the
utf8pragma is required to interpret the “フ” UTF‑8 sequence in the source code as a character instead as separate bytes.The Perl
-Ccommand-line switch and theutf8pragma are locale-independent, but the shell’sechocommand is not.