At the moment, I’m writing these arrays by hand. For example, the Miscellaneous Mathematical

Question

0

Editorial Team

Asked: May 15, 20262026-05-15T02:00:39+00:00 2026-05-15T02:00:39+00:00

At the moment, I’m writing these arrays by hand. For example, the Miscellaneous Mathematical

0

At the moment, I’m writing these arrays by hand.

For example, the Miscellaneous Mathematical Symbols-A block has an entry in hash like this:

my %symbols = (
    ...
    miscellaneous_mathematical_symbols_a => [(0x27C0..0x27CA), 0x27CC,
        (0x27D0..0x27EF)],
    ...
)

The simpler, ‘continuous’ array

miscellaneous_mathematical_symbols_a => [0x27C0..0x27EF]

doesn’t work because Unicode blocks have holes in them. For example, there’s nothing at 0x27CB. Take a look at the code chart [PDF].

Writing these arrays by hand is tedious, error-prone and a bit fun. And I get the feeling that someone has already tackled this in Perl!

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-15T02:00:39+00:00

Perhaps you want Unicode::UCD? Use its charblock routine to get the range of any named block. If you want to get those names, you can use charblocks.

This module is really just an interface to the Unicode databases that come with Perl already, so if you have to do something fancier, you can look at the lib/5.x.y/unicore/UnicodeData.txt or the various other files in that same directory to get what you need.

Here’s what I came up with to create your %symbols. I go through all the blocks (although in this sample I skip that ones without “Math” in their name. I get the starting and ending code points and check which ones are assigned. From that, I create a custom property that I can use to check if a character is in the range and assigned.

use strict;
use warnings;

digest_blocks();

my $property = 'My::InMiscellaneousMathematicalSymbolsA';

foreach ( 0x27BA..0x27F3 )
    {
    my $in = chr =~ m/\p{$property}/;

    printf "%X is %sin $property\n",
        $_, $in ? '' : ' not ';
    }


sub digest_blocks {
    use Unicode::UCD qw(charblocks);

    my $blocks = charblocks();

    foreach my $block ( keys %$blocks )
        {
        next unless $block =~ /Math/; # just to make the output small

        my( $start, $stop ) = @{ $blocks->{$block}[0] };

        $blocks->{$block} = {
            assigned   => [ grep { chr =~ /\A\p{Assigned}\z/ } $start .. $stop ],
            unassigned => [ grep { chr !~ /\A\p{Assigned}\z/ } $start .. $stop ],
            start      => $start,
            stop       => $stop,
            name       => $block,
            };

        define_my_property( $blocks->{$block} );
        }
    }

sub define_my_property {
    my $block = shift;

    (my $subname = $block->{name}) =~ s/\W//g;
    $block->{my_property} = "My::In$subname"; # needs In or Is

    no strict 'refs';
    my $string = join "\n", # can do ranges here too
        map { sprintf "%X", $_ } 
        @{ $block->{assigned} };

    *{"My::In$subname"} = sub { $string };
    }

If I were going to do this a lot, I’d use the same thing to create a Perl source file that has the custom properties already defined so I can just use them right away in any of my work. None of the data should change until you update your Unicode data.

sub define_my_property {
    my $block = shift;

    (my $subname = $block->{name}) =~ s/\W//g;
    $block->{my_property} = "My::In$subname"; # needs In or Is

    no strict 'refs';
    my $string = num2range( @{ $block->{assigned} } );

    print <<"HERE";
sub My::In$subname {
    return <<'CODEPOINTS';
$string
CODEPOINTS
    }

HERE
    }

# http://www.perlmonks.org/?node_id=87538
sub num2range {
  local $_ = join ',' => sort { $a <=> $b } @_;
  s/(?<!\d)(\d+)(?:,((??{$++1})))+(?!\d)/$1\t$+/g;
  s/(\d+)/ sprintf "%X", $1/eg;
  s/,/\n/g;
  return $_;
}

That gives me output suitable for a Perl library:

sub My::InMiscellaneousMathematicalSymbolsA {
    return <<'CODEPOINTS';
27C0    27CA
27CC
27D0    27EF
CODEPOINTS
    }

sub My::InSupplementalMathematicalOperators {
    return <<'CODEPOINTS';
2A00    2AFF
CODEPOINTS
    }

sub My::InMathematicalAlphanumericSymbols {
    return <<'CODEPOINTS';
1D400   1D454
1D456   1D49C
1D49E   1D49F
1D4A2
1D4A5   1D4A6
1D4A9   1D4AC
1D4AE   1D4B9
1D4BB
1D4BD   1D4C3
1D4C5   1D505
1D507   1D50A
1D50D   1D514
1D516   1D51C
1D51E   1D539
1D53B   1D53E
1D540   1D544
1D546
1D54A   1D550
1D552   1D6A5
1D6A8   1D7CB
1D7CE   1D7FF
CODEPOINTS
    }

sub My::InMiscellaneousMathematicalSymbolsB {
    return <<'CODEPOINTS';
2980    29FF
CODEPOINTS
    }

sub My::InMathematicalOperators {
    return <<'CODEPOINTS';
2200    22FF
CODEPOINTS
    }

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

At the moment, I’m writing these arrays by hand. For example, the Miscellaneous Mathematical

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply