I am parsing some text in Ruby that contains Unicode character that I would

Question

0

Asked: May 29, 20262026-05-29T05:45:45+00:00 2026-05-29T05:45:45+00:00

I am parsing some text in Ruby that contains Unicode character that I would

0

I am parsing some text in Ruby that contains Unicode character that I would like to transcribe to ASCII values in one output file and HTML encoding in another. Is there a simple way of spitting out the non-ASCII characters found in a file? For example:

\u00A0 #should become a " " in the text text file, but &nbsp; in the html output file

I’m going to manually transcribe them based upon my needs and would like to output a list of unique characters I’ll need to transcribe from my initial input file.

Thanks,
Ben

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-29T05:45:47+00:00

There’s a method that helps to extract the characters found in your string:

"foo\u00A0bar".chars.to_a
# => ["f", "o", "o", " ", "b", "a", "r"]

Since some of these characters may be multi-byte UNICODE characters you might want to expand that into bytes as well, to be more thorough:

"foo\u00A0bar".chars.to_a.collect { |c| [ c, c.bytes.to_a ] }
# => [["f", [102]], ["o", [111]], ["o", [111]], [" ", [194, 160]], ["b", [98]], ["a", [97]], ["r", [114]]]

The array breaks down the specific bytes used to construct that character. In this case the non-breaking space shows up as " " but is actually [194, 160] internally.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I am parsing some text in Ruby that contains Unicode character that I would

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply