How do I replace a Unicode numeral subscript or superscript (eg, ₂ ) with

Question

0

Asked: May 30, 20262026-05-30T22:43:34+00:00 2026-05-30T22:43:34+00:00

How do I replace a Unicode numeral subscript or superscript (eg, ₂ ) with

0

How do I replace a Unicode numeral subscript or superscript (eg, ₂) with the corresponding numeral (ie, 2) using regular expressions? I can of course replace each of them separately, but that is ten lines of code…

I am implementing this in Perl but that should not really matter.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-30T22:43:35+00:00

Here from the unisupers script is a Perl function to convert to Unicode superscripts:

sub convert_to_superscripts (_) {
   my $string = $_[0];
   $string =~ tr[+−=()0123456789AaÆᴂɐɑɒBbcɕDdðEeƎəɛɜɜfGgɡɣhHɦIiɪɨᵻɩjJʝɟKklLʟᶅɭMmɱNnɴɲɳŋOoɔᴖᴗɵȢPpɸrRɹɻʁsʂʃTtƫUuᴜᴝʉɥɯɰʊvVʋʌwWxyzʐʑʒꝯᴥβγδθφχнნʕⵡ]
                [⁺⁻⁼⁽⁾⁰¹²³⁴⁵⁶⁷⁸⁹ᴬᵃᴭᵆᵄᵅᶛᴮᵇᶜᶝᴰᵈᶞᴱᵉᴲᵊᵋᶟᵌᶠᴳᵍᶢˠʰᴴʱᴵⁱᶦᶤᶧᶥʲᴶᶨᶡᴷᵏˡᴸᶫᶪᶩᴹᵐᶬᴺⁿᶰᶮᶯᵑᴼᵒᵓᵔᵕᶱᴽᴾᵖᶲʳᴿʴʵʶˢᶳᶴᵀᵗᶵᵁᵘᶸᵙᶶᶣᵚᶭᶷᵛⱽᶹᶺʷᵂˣʸᶻᶼᶽᶾꝰᵜᵝᵞᵟᶿᵠᵡᵸჼˤⵯ];
   return $string;
}

And from the unisubs script is one for subscripts:

sub convert_to_subscripts (_) {
   my $string = $_[0];
   $string =~ tr[+−=()0123456789aeəhijklmnoprstuvxβγρφχ]
                [₊₋₌₍₎₀₁₂₃₄₅₆₇₈₉ₐₑₔₕᵢⱼₖₗₘₙₒₚᵣₛₜᵤᵥₓᵦᵧᵨᵩᵪ];
   return $string;
}

You just have to go the other way.

Another and simpler approach is simply to use the k-compat normalizations, which just return the base characters instead of their upper/lower versions. I haven’t checked these to see that they are all the inverses of the functions above. You can play with them using the nfkd and
nfkc scripts.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

How do I replace a Unicode numeral subscript or superscript (eg, ₂ ) with

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply