Below I attach my script in Perl. I am testing the number 1234 with

Question

0

Asked: June 18, 20262026-06-18T09:11:30+00:00 2026-06-18T09:11:30+00:00

Below I attach my script in Perl. I am testing the number 1234 with

0

Below I attach my script in Perl. I am testing the number 1234 with one equivalent in Japanese. (I copied from Wikipedia… maybe it is not 100% correct).

Using

\p{decimal number}+
\p{Number}+
\d+

The code works fine for the ASCII version, but for Japanese I find only this example:

[0-9\x{3041}-\x{3096}\x{30a1}-\x{30fc}\x{4e00}-\x{9faf}]

What I am doing wrong in this case?

use 5.016;

use utf8;
use charnames   qw< :full >;
use feature     qw< unicode_strings >;

use Test::More tests => 2;

sub is_valid {
  my $string = shift;

  $string ~~ /^[0-9\x{3041}-\x{3096}\x{30a1}-\x{30fc}\x{4e00}-\x{9faf}]+$/u

  #/\p{decimal number}+/msx
}

ok(is_valid("1234"), "ascii");
ok(is_valid("壱弐参四"), "japanese");

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-18T09:11:31+00:00

Your code passes for me on v5.14.

The /u doesn’t do what you think it does there since you have just ASCII in the pattern. You require v5.16, and that showed up in v5.14. No big whoop unless there’s some v5.16 enhancement you’re trying to use.

As many people have noted, there’s a semantic difference between numbers and digits. I think you just want to match a run of digits. The problem is that the UCS doesn’t label the characters you want to match as digits.

As such, you created a very expansive character class to do that. I think you’re stuck with that. You probably don’t want to keep doing that. You could hide it all in a subroutine, but you can also define additional properties. You create a specially named subroutine that returns a string with lines of character ranges as hex values. Here’s an example for perlunicode:

sub InKana {
    return <<END;
3040\t309F
30A0\t30FF
END
}

You might use the Unicode::Unihan module to figure out which points you want. You can do it with code, but all this is doing in looking in the Unihan database file that’s the same name as the method. Someone who actually knows Japanese will have to tweak this to select the right characters:

use v5.10;

use Number::Range;
use Unicode::Unihan;

my $db = Unicode::Unihan->new;
my $range = Number::Range->new;

foreach my $u ( 0 .. 0x01dfff ) {
    my $char = chr $u;
    next unless $char =~ /\p{Script: Han}/;
    my $value = 
        $db->PrimaryNumeric( $char ) ||
        $db->AccountingNumeric( $char ) ||
        $db->OtherNumeric( $char )
        ;
    next unless defined $value;
    my $hex = sprintf "%X", $u;
    say chr($u), " (U+$hex) has numeric value: ", $value;
    $range->addrange( $u );
    }

my $sub = 
q(sub InJapaneseDigit {
    return <<'HERE';
)

.

join( "\n", 
    map { 
        join "\t", 
            map { sprintf "%X", $_ } 
            split /\.\./;  
        } 
    split /,/, $range->range 
    )

.

qq(\nHERE\n});

say $sub;

That program outputs:

㐅 (U+3405) has numeric value: 5
㒃 (U+3483) has numeric value: 2
㠪 (U+382A) has numeric value: 5
㭍 (U+3B4D) has numeric value: 7
一 (U+4E00) has numeric value: 1
七 (U+4E03) has numeric value: 7
万 (U+4E07) has numeric value: 10000
三 (U+4E09) has numeric value: 3
九 (U+4E5D) has numeric value: 9
二 (U+4E8C) has numeric value: 2
五 (U+4E94) has numeric value: 5
亖 (U+4E96) has numeric value: 4
亿 (U+4EBF) has numeric value: 100000000
什 (U+4EC0) has numeric value: 10
仟 (U+4EDF) has numeric value: 1000
仨 (U+4EE8) has numeric value: 3
伍 (U+4F0D) has numeric value: 5
佰 (U+4F70) has numeric value: 100
億 (U+5104) has numeric value: 100000000
兆 (U+5146) has numeric value: 1000000000000
兩 (U+5169) has numeric value: 2
八 (U+516B) has numeric value: 8
六 (U+516D) has numeric value: 6
十 (U+5341) has numeric value: 10
千 (U+5343) has numeric value: 1000
卄 (U+5344) has numeric value: 20
卅 (U+5345) has numeric value: 30
卌 (U+534C) has numeric value: 40
叁 (U+53C1) has numeric value: 3
参 (U+53C2) has numeric value: 3
參 (U+53C3) has numeric value: 3
叄 (U+53C4) has numeric value: 3
四 (U+56DB) has numeric value: 4
壱 (U+58F1) has numeric value: 1
壹 (U+58F9) has numeric value: 1
幺 (U+5E7A) has numeric value: 1
廾 (U+5EFE) has numeric value: 9
廿 (U+5EFF) has numeric value: 20
弌 (U+5F0C) has numeric value: 1
弍 (U+5F0D) has numeric value: 2
弎 (U+5F0E) has numeric value: 3
弐 (U+5F10) has numeric value: 2
拾 (U+62FE) has numeric value: 10
捌 (U+634C) has numeric value: 8
柒 (U+67D2) has numeric value: 7
漆 (U+6F06) has numeric value: 7
玖 (U+7396) has numeric value: 9
百 (U+767E) has numeric value: 100
肆 (U+8086) has numeric value: 4
萬 (U+842C) has numeric value: 10000
貮 (U+8CAE) has numeric value: 2
貳 (U+8CB3) has numeric value: 2
贰 (U+8D30) has numeric value: 2
阡 (U+9621) has numeric value: 1000
陆 (U+9646) has numeric value: 6
陌 (U+964C) has numeric value: 100
陸 (U+9678) has numeric value: 6

sub InJapaneseDigit {
        return <<'HERE';
3405
3483
382A
3B4D
4E00
4E03
4E07
4E09
4E5D
4E8C
4E94
4E96
4EBF    4EC0
4EDF
4EE8
4F0D
4F70
5104
5146
5169
516B
516D
5341
5343    5345
534C
53C1    53C4
56DB
58F1
58F9
5E7A
5EFE    5EFF
5F0C    5F0E
5F10
62FE
634C
67D2
6F06
7396
767E
8086
842C
8CAE
8CB3
8D30
9621
9646
964C
9678
HERE
}

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

Below I attach my script in Perl. I am testing the number 1234 with

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply