When I do: use strict; use warnings; my $regex = qr/[[:upper:]]/; my $line =

Question

0

Asked: May 22, 20262026-05-22T15:14:40+00:00 2026-05-22T15:14:40+00:00

When I do: use strict; use warnings; my $regex = qr/[[:upper:]]/; my $line =

0

When I do:

use strict; use warnings;
my $regex = qr/[[:upper:]]/;
my $line = MyModule::get_my_line_from_external_source(); #file, db, etc...
print "upper here\n" if( $line =~ $regex );

How perl will know when it must match only ascii uppercase and when utf8 uppercase?
It is an precompiled regex – so somewhat perl must know, what is uppercase. Dependent on locale settings? If yes, how to match utf8 uppercase in “C” locale with precompiled regex?

updated based on tchrist’s comments:

use strict; use warnings; use Encode;
my $regex = qr/[[:upper:]]/;

my $line = XXX::line();
print "$line: upper1 ", ($line =~ $regex) ? "YES" : "NO", "\n";

my $uline = Encode::decode_utf8($line);
print "$uline: upper2 ", ($uline =~ $regex) ? "YES" : "NO", "\n";

package XXX;
sub line { return "alpha-Ω"; } #returning octets - not utf8 chars

The output is:

alpha-Ω: upper1 NO
alpha-Ω: upper2 YES

What does it mean, that the precompiled regex is not ‘hard-precompiled’ but ‘soft-precompiled’ – so perl replace ‘[[:upper:]]’ based on the utf8 flag of the matched $line.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-22T15:14:41+00:00

Before Perl 5.14, this was not very well defined.

With 5.14, the pattern known how it was compiled, and you have the /u, /l, /d, /a, or /aa pattern modifiers. You can also say

use re "/u";

or

use re "/msu";

to turn all those flags on in the lexical scope.

For example, under 5.14:

% perl -le 'print qr/foo/'
(?^:foo)
% perl -E 'say qr/foo/'
(?^u:foo)
% perl -E 'say qr/foo/l'
(?^l:foo)

I would stear clear of locales; just use all-Unicode.

BTW, I would make darned sure that that “external source” gave you back a string that was properly decoded; that is, has its UTF8 flag turned on. Character functions work poorly on encoded strings, because they really want decoded strings instead.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

When I do: use strict; use warnings; my $regex = qr/[[:upper:]]/; my $line =

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply