In ruby 1.9.3, I can get the codepoints of a string: > foo\u00f6.codepoints.to_a =>

Question

0

Asked: June 2, 20262026-06-02T23:57:36+00:00 2026-06-02T23:57:36+00:00

In ruby 1.9.3, I can get the codepoints of a string: > foo\u00f6.codepoints.to_a =>

0

In ruby 1.9.3, I can get the codepoints of a string:

> "foo\u00f6".codepoints.to_a
 => [102, 111, 111, 246]

Is there a built-in method to go the other direction, ie from integer array to string?

I’m aware of:

# not acceptable; only works with UTF-8
[102, 111, 111, 246].pack("U*")

# works, but not very elegant
[102, 111, 111, 246].inject('') {|s, cp| s << cp }

# concise, but I need to unshift that pesky empty string to "prime" the inject call
['', 102, 111, 111, 246].inject(:<<)

UPDATE (response to Niklas’ answer)

Interesting discussion.
pack("U*") always returns a UTF-8 string, while the inject version returns a string in the file’s source encoding.

#!/usr/bin/env ruby
# encoding: iso-8859-1

p [102, 111, 111, 246].inject('', :<<).encoding
p [102, 111, 111, 246].pack("U*").encoding
# this raises an Encoding::CompatibilityError
[102, 111, 111, 246].pack("U*") =~ /\xf6/

For me, the inject call returns an ISO-8859-1 string, while pack returns a UTF-8. To prevent the error, I could use pack("U*").encode(__ENCODING__) but that makes me do extra work.

UPDATE 2

Apparently the String#<< doesn’t always append correctly depending on the string’s encoding. So it looks like pack is still the best option.

[225].inject(''.encode('utf-16be'), :<<)  # fails miserably
[225].pack("U*").encode('utf-16be')  # works

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-02T23:57:37+00:00

The most obvious adaption of your own attempt would be

[102, 111, 111, 246].inject('', :<<)

This is however not a good solution, as it only works if the initial empty string literal has an encoding that is capable of holding the entire Unicode character range. The following fails:

#!/usr/bin/env ruby
# encoding: iso-8859-1
p "\u{1234}".codepoints.to_a.inject('', :<<)

So I’d actually recommend

codepoints.pack("U*")

I don’t know what you mean by “only works with UTF-8”. It creates a Ruby string with UTF-8 encoding, but UTF-8 can hold the whole Unicode character range, so what’s the problem? Observe:

irb(main):010:0> s = [0x33333, 0x1ffff].pack("U*")
=> "\u{33333}\u{1FFFF}"
irb(main):011:0> s.encoding
=> #<Encoding:UTF-8>
irb(main):012:0> [0x33333, 0x1ffff].pack("U*") == [0x33333, 0x1ffff].inject('', :<<)
=> true

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

In ruby 1.9.3, I can get the codepoints of a string: > foo\u00f6.codepoints.to_a =>

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply