Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 6328961
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 24, 20262026-05-24T17:33:33+00:00 2026-05-24T17:33:33+00:00

Is there any way to enumerate all of a character’s Unicode properties in Ruby?

  • 0

Is there any way to enumerate all of a character’s Unicode properties in Ruby? I can use Ruby 1.9’s Regexp class to test whether a given character has a particular property (e.g., some_char =~ /\p{P}/ to test whether some_char is punctuation, etc.)… but since characters can have multiple properties ((, for example, is both punctuation and ASCII, etc.), it would be nice to just be able to get a list of all of a character’s properties.

I could probably do this by hand using unicode_data.txt, or whatever it’s called, but this seems like the sort of thing that’s probably already been done somewhere. UnicodeUtils doesn’t appear to have anything along these lines, and Googling didn’t turn up anything obvious. Thanks!

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-24T17:33:34+00:00Added an answer on May 24, 2026 at 5:33 pm

    You can call out to my uniprops script.

    $ uniprops -p delta greek:delta Greek:Delta
        U+1E9F ‹ẟ› \N{ LATIN SMALL LETTER DELTA }:
            \w \pL \p{LC} \p{L_} \p{L&} \p{Ll}
        U+03B4 ‹δ› \N{ GREEK SMALL LETTER DELTA }:
            \w \pL \p{LC} \p{L_} \p{L&} \p{Ll}
        U+0394 ‹Δ› \N{ GREEK CAPITAL LETTER DELTA }:
            \w \pL \p{LC} \p{L_} \p{L&} \p{Lu}
    
    $ uniprops \# ç π
        U+0023 ‹#› \N{ NUMBER SIGN }:
            \pP \p{Po}
            All Any ASCII Assigned Common Zyyy Po P Gr_Base
               Grapheme_Base Graph GrBase Other_Punctuation Punct Pat_Syn
               Pattern_Syntax PatSyn PosixGraph PosixPrint PosixPunct
               Print Punctuation
        U+00E7 ‹ç› \N{ LATIN SMALL LETTER C WITH CEDILLA }:
            \w \pL \p{LC} \p{L_} \p{L&} \p{Ll}
            All Any Alnum Alpha Alphabetic Assigned InLatin1 Cased
               Cased_Letter LC Changes_When_Casemapped CWCM
               Changes_When_Titlecased CWT Changes_When_Uppercased CWU Ll
               L Gr_Base Grapheme_Base Graph GrBase ID_Continue IDC
               ID_Start IDS Letter L_ Latin Latn Lowercase_Letter Lower
               Lowercase Print Word XID_Continue XIDC XID_Start XIDS
        U+03C0 ‹π› \N{ GREEK SMALL LETTER PI }:
            \w \pL \p{LC} \p{L_} \p{L&} \p{Ll}
            All Any Alnum Alpha Alphabetic Assigned Greek Is_Greek
               InGreek Cased Cased_Letter LC Changes_When_Casemapped CWCM
               Changes_When_Titlecased CWT Changes_When_Uppercased CWU Ll
               L Gr_Base Grapheme_Base Graph GrBase Grek Greek_And_Coptic
               ID_Continue IDC ID_Start IDS Letter L_ Lowercase_Letter
               Lower Lowercase Print Word XID_Continue XIDC XID_Start XIDS
    
    
    $ uniprops -a 'MICRO SIGN'
    U+00B5 ‹µ› \N{MICRO SIGN}
        \w \pL \p{LC} \p{L_} \p{L&} \p{Ll}
        All Any Alnum Alpha Alphabetic Assigned InLatin1 Cased Cased_Letter LC Changes_When_Casefolded CWCF Changes_When_Casemapped CWCM
           Changes_When_NFKC_Casefolded CWKCF Changes_When_Titlecased CWT Changes_When_Uppercased CWU Common Zyyy Ll L Gr_Base Grapheme_Base
           Graph GrBase ID_Continue IDC ID_Start IDS Letter L_ Latin_1 Latin_1_Supplement Lowercase_Letter Lower Lowercase Print Word
           XID_Continue XIDC XID_Start XIDS X_POSIX_Alnum X_POSIX_Alpha X_POSIX_Graph X_POSIX_Lower X_POSIX_Print X_POSIX_Word
        Age=1.1 Bidi_Class=L Bidi_Class=Left_To_Right BC=L Block=Latin_1 Block=Latin_1_Supplement BLK=Latin1 Canonical_Combining_Class=0
           Canonical_Combining_Class=Not_Reordered CCC=NR Canonical_Combining_Class=NR Script=Common Decomposition_Type=Com
           Decomposition_Type=Compat DT=Com Decomposition_Type=Non_Canon Decomposition_Type=Non_Canonical DT=NonCanon East_Asian_Width=Neutral
           Grapheme_Cluster_Break=Other GCB=XX Grapheme_Cluster_Break=XX Hangul_Syllable_Type=NA Hangul_Syllable_Type=Not_Applicable HST=NA
           Joining_Group=No_Joining_Group JG=NoJoiningGroup Joining_Type=Non_Joining JT=U Joining_Type=U Line_Break=AL Line_Break=Alphabetic
           LB=AL Numeric_Type=None NT=None Numeric_Value=NaN NV=NaN Present_In=1.1 IN=1.1 Present_In=2.0 IN=2.0 Present_In=2.1 IN=2.1
           Present_In=3.0 IN=3.0 Present_In=3.1 IN=3.1 Present_In=3.2 IN=3.2 Present_In=4.0 IN=4.0 Present_In=4.1 IN=4.1 Present_In=5.0 IN=5.0
           Present_In=5.1 IN=5.1 Present_In=5.2 IN=5.2 Present_In=6.0 IN=6.0 SC=Zyyy Script=Zyyy Sentence_Break=LO Sentence_Break=Lower SB=LO
           Word_Break=ALetter WB=LE Word_Break=LE _X_Begin
    
    $ uniprops -a 2011
    U+2011 ‹‑› \N{NON-BREAKING HYPHEN}
        \pP \p{Pd}
        All Any Assigned InGeneralPunctuation Changes_When_NFKC_Casefolded CWKCF Common Zyyy Dash Dash_Punctuation Pd P General_Punctuation
           Gr_Base Grapheme_Base Graph GrBase Punct Pat_Syn Pattern_Syntax PatSyn Print Punctuation X_POSIX_Graph X_POSIX_Print X_POSIX_Punct
        Age=1.1 Bidi_Class=ON Bidi_Class=Other_Neutral BC=ON Block=General_Punctuation Canonical_Combining_Class=0
           Canonical_Combining_Class=Not_Reordered CCC=NR Canonical_Combining_Class=NR Script=Common Decomposition_Type=Nb
           Decomposition_Type=Nobreak DT=Nb Decomposition_Type=Non_Canon Decomposition_Type=Non_Canonical DT=NonCanon East_Asian_Width=Neutral
           Grapheme_Cluster_Break=Other GCB=XX Grapheme_Cluster_Break=XX Hangul_Syllable_Type=NA Hangul_Syllable_Type=Not_Applicable HST=NA
           Joining_Group=No_Joining_Group JG=NoJoiningGroup Joining_Type=Non_Joining JT=U Joining_Type=U Line_Break=GL Line_Break=Glue LB=GL
           Numeric_Type=None NT=None Numeric_Value=NaN NV=NaN Present_In=1.1 IN=1.1 Present_In=2.0 IN=2.0 Present_In=2.1 IN=2.1 Present_In=3.0
           IN=3.0 Present_In=3.1 IN=3.1 Present_In=3.2 IN=3.2 Present_In=4.0 IN=4.0 Present_In=4.1 IN=4.1 Present_In=5.0 IN=5.0 Present_In=5.1
           IN=5.1 Present_In=5.2 IN=5.2 Present_In=6.0 IN=6.0 SC=Zyyy Script=Zyyy Sentence_Break=Other SB=XX Sentence_Break=XX Word_Break=Other
           WB=XX Word_Break=XX _X_Begin
    
        $ uniprops -l | grep Greek | sort -dfu
        Blk=Greek
        Block:Ancient_Greek_Musical_Notation
        Block:Ancient_Greek_Numbers
        Block:Greek
        Block=Greek_And_Coptic
        Block:Greek_Extended
        Greek
        Greek_And_Coptic
        InAncientGreekMusicalNotation
        InAncientGreekNumbers
        InGreek
        InGreekExtended
        Is_Greek
        Script=Greek
    

    You probably want to also get unichars so you can go the other way. Here are just the examples of calling it:

     $ unichars -gns '\p{Cased}' '\p{Number}'
     $ unichars '\R'
     $ unichars '\S' '[\v\h]' 
     $ unichars '\S' '\p{space}'   
     $ unichars '\pL' '\p{Greek}'
     $ unichars '\pL' '\p{Greek}' | um
     $ unichars '\p{Age=6.0}'     | um
     $ unichars '\p{Lowercase}' '\P{Lowercase_Letter}' 
     $ unichars '\p{Lower}'     '\P{Ll}'  # same but easier to type
     $ unichars -a '\p{alphabetic}' '\P{Letter}' | wc -l # 1006 code points
     $ unichars -gas '\PL' '\p{Cased}'
     $ unichars -gas '\P{MARK}' '\p{diacritic}'   #  209 code points
     $ unichars -gas '\pM' '\P{BC=NSM}'
     $ unichars -gas '\p{Cased}' '[^\p{CWL}\p{CWT}\p{CWU}]'  
     $ unichars -gas '\p{Dash}'
     $ unichars -gas '\p{mark}' '\P{DIACRITIC}'   # 1068 code points
     $ unichars -gas 'grep { length > 1 } lc, ucfirst, uc'
     $ unichars -gas 'uc ne ucfirst'
     $ unichars -gasn NUM
    

    Here is one example of the output:

    $ unichars -gsn NUM 'int NUM ne NUM'
    ‭ 0  U+0030 GC=Nd      0=NV  SC=Common       DIGIT ZERO
    ‭ ¼  U+00BC GC=No    1/4=NV  SC=Common       VULGAR FRACTION ONE QUARTER
    ‭ ½  U+00BD GC=No    1/2=NV  SC=Common       VULGAR FRACTION ONE HALF
    ‭ ¾  U+00BE GC=No    3/4=NV  SC=Common       VULGAR FRACTION THREE QUARTERS
    ‭ ٠  U+0660 GC=Nd      0=NV  SC=Common       ARABIC-INDIC DIGIT ZERO
    ‭ ۰  U+06F0 GC=Nd      0=NV  SC=Arabic       EXTENDED ARABIC-INDIC DIGIT ZERO
    ‭ ߀  U+07C0 GC=Nd      0=NV  SC=Nko          NKO DIGIT ZERO
    ‭ ०  U+0966 GC=Nd      0=NV  SC=Devanagari   DEVANAGARI DIGIT ZERO
    ‭ ০  U+09E6 GC=Nd      0=NV  SC=Bengali      BENGALI DIGIT ZERO
    ‭ ৴  U+09F4 GC=No   1/16=NV  SC=Bengali      BENGALI CURRENCY NUMERATOR ONE
    ‭ ৵  U+09F5 GC=No    1/8=NV  SC=Bengali      BENGALI CURRENCY NUMERATOR TWO
    ‭ ৶  U+09F6 GC=No   3/16=NV  SC=Bengali      BENGALI CURRENCY NUMERATOR THREE
    ‭ ৷  U+09F7 GC=No    1/4=NV  SC=Bengali      BENGALI CURRENCY NUMERATOR FOUR
    ‭ ৸  U+09F8 GC=No    3/4=NV  SC=Bengali      BENGALI CURRENCY NUMERATOR ONE LESS THAN THE DENOMINATOR
    ‭ ੦  U+0A66 GC=Nd      0=NV  SC=Gurmukhi     GURMUKHI DIGIT ZERO
    ‭ ૦  U+0AE6 GC=Nd      0=NV  SC=Gujarati     GUJARATI DIGIT ZERO
    ‭ ୦  U+0B66 GC=Nd      0=NV  SC=Oriya        ORIYA DIGIT ZERO
    ‭ ୲  U+0B72 GC=No    1/4=NV  SC=Oriya        ORIYA FRACTION ONE QUARTER
    ‭ ୳  U+0B73 GC=No    1/2=NV  SC=Oriya        ORIYA FRACTION ONE HALF
    ‭ ୴  U+0B74 GC=No    3/4=NV  SC=Oriya        ORIYA FRACTION THREE QUARTERS
    ‭ ୵  U+0B75 GC=No   1/16=NV  SC=Oriya        ORIYA FRACTION ONE SIXTEENTH
    ‭ ୶  U+0B76 GC=No    1/8=NV  SC=Oriya        ORIYA FRACTION ONE EIGHTH
    ‭ ୷  U+0B77 GC=No   3/16=NV  SC=Oriya        ORIYA FRACTION THREE SIXTEENTHS
    

    etc.

    I describe these the first of my OSCON Unicode talks. Those are just two of the tools in a suite of a couple of dozen of them.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

Is there any way I can run class files (i.e. with main as the
Is there any way I can add a static extension method to a class.
Is there any way for a consumer to enumerate all interfaces implemented by a
Is there any way that I can create a new delegate type based on
On page load is there a way to enumerate all the nest user controls
Is there any way to write decorators within a class structure that nest well?
Is there any way to change the BackColor of the border of a panel
Is there any way to view the reduction steps in haskell, i.e trace the
Is there any way to change the icon of an application after it is
Is there any way to validate the contents of a CEdit box without subclassing?

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.