In ruby 1.9.3, I can get the codepoints of a string:
> "foo\u00f6".codepoints.to_a
=> [102, 111, 111, 246]
Is there a built-in method to go the other direction, ie from integer array to string?
I’m aware of:
# not acceptable; only works with UTF-8
[102, 111, 111, 246].pack("U*")
# works, but not very elegant
[102, 111, 111, 246].inject('') {|s, cp| s << cp }
# concise, but I need to unshift that pesky empty string to "prime" the inject call
['', 102, 111, 111, 246].inject(:<<)
UPDATE (response to Niklas’ answer)
Interesting discussion.
pack("U*") always returns a UTF-8 string, while the inject version returns a string in the file’s source encoding.
#!/usr/bin/env ruby
# encoding: iso-8859-1
p [102, 111, 111, 246].inject('', :<<).encoding
p [102, 111, 111, 246].pack("U*").encoding
# this raises an Encoding::CompatibilityError
[102, 111, 111, 246].pack("U*") =~ /\xf6/
For me, the inject call returns an ISO-8859-1 string, while pack returns a UTF-8. To prevent the error, I could use pack("U*").encode(__ENCODING__) but that makes me do extra work.
UPDATE 2
Apparently the String#<< doesn’t always append correctly depending on the string’s encoding. So it looks like pack is still the best option.
[225].inject(''.encode('utf-16be'), :<<) # fails miserably
[225].pack("U*").encode('utf-16be') # works
The most obvious adaption of your own attempt would be
This is however not a good solution, as it only works if the initial empty string literal has an encoding that is capable of holding the entire Unicode character range. The following fails:
So I’d actually recommend
I don’t know what you mean by “only works with UTF-8”. It creates a Ruby string with UTF-8 encoding, but UTF-8 can hold the whole Unicode character range, so what’s the problem? Observe: