I am parsing some text in Ruby that contains Unicode character that I would like to transcribe to ASCII values in one output file and HTML encoding in another. Is there a simple way of spitting out the non-ASCII characters found in a file? For example:
\u00A0 #should become a " " in the text text file, but in the html output file
I’m going to manually transcribe them based upon my needs and would like to output a list of unique characters I’ll need to transcribe from my initial input file.
Thanks,
Ben
There’s a method that helps to extract the characters found in your string:
Since some of these characters may be multi-byte UNICODE characters you might want to expand that into bytes as well, to be more thorough:
The array breaks down the specific bytes used to construct that character. In this case the non-breaking space shows up as
" "but is actually[194, 160]internally.