Is there any way to enumerate all of a character’s Unicode properties in Ruby? I can use Ruby 1.9’s Regexp class to test whether a given character has a particular property (e.g., some_char =~ /\p{P}/ to test whether some_char is punctuation, etc.)… but since characters can have multiple properties ((, for example, is both punctuation and ASCII, etc.), it would be nice to just be able to get a list of all of a character’s properties.
I could probably do this by hand using unicode_data.txt, or whatever it’s called, but this seems like the sort of thing that’s probably already been done somewhere. UnicodeUtils doesn’t appear to have anything along these lines, and Googling didn’t turn up anything obvious. Thanks!
You can call out to my uniprops script.
You probably want to also get unichars so you can go the other way. Here are just the examples of calling it:
Here is one example of the output:
etc.
I describe these the first of my OSCON Unicode talks. Those are just two of the tools in a suite of a couple of dozen of them.