I’m running Ruby 1.9.2 and trying to fix some broken UTF-8 text input where

Question

0

Asked: May 25, 20262026-05-25T01:25:15+00:00 2026-05-25T01:25:15+00:00

I’m running Ruby 1.9.2 and trying to fix some broken UTF-8 text input where

0

I’m running Ruby 1.9.2 and trying to fix some broken UTF-8 text input where the text is literally "\\354\\203\\201\\355\\221\\234\\353\\252\\205" and change it into its correct Korean "상표명"

However after searching for a while and trying a few methods I still get out gibberish.
It’s confusing as the escaped characters example on line 3 works fine

# encoding: utf-8
puts "상표명" # Target string
# Output: "상표명"

puts "\354\203\201\355\221\234\353\252\205" # Works with escaped characters like this
# Output: "상표명"

# Real input is a string
input = "\\354\\203\\201\\355\\221\\234\\353\\252\\205"

# After some manipulation got it into an array of numbers
puts [354, 203,201,355,221,234,353,252,205].pack('U*').force_encoding('UTF-8')
# Output: ŢËÉţÝêšüÍ (gibberish)

I’m sure this must have been answered somewhere but I haven’t managed to find it.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-25T01:25:16+00:00

This is what you want to do to get your UTF-8 Korean text:

s = "\\354\\203\\201\\355\\221\\234\\353\\252\\205"
k = s.scan(/\d+/).map { |n| n.to_i(8) }.pack("C*").force_encoding('utf-8')
# "상표명"

And this is how it works:

The input string is nice and regular so we can use scan to pull out the individual number.
Then a map with to_i(8) to convert the octal values (as noted by Henning Makholm) to integers.
Now we need to convert our list of integers to bytes so we pack('C*') to get a byte string. This string will have the BINARY encoding (AKA ASCII-8BIT).
We happen to know that the bytes really do represent UTF-8 so we can force the issue with force_encoding('utf-8').

The main thing that you were missing was your pack format; 'U' means “UTF-8 character” and would expect an array of Unicode codepoints each represented by a single integer, 'C' expects an array of bytes and that’s what we had.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’m running Ruby 1.9.2 and trying to fix some broken UTF-8 text input where

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply