I have to convert Latin chars like éáéíóúÀÉÍÓÚ etc., into a string to similar

Question

0

Asked: May 27, 20262026-05-27T09:19:29+00:00 2026-05-27T09:19:29+00:00

I have to convert Latin chars like éáéíóúÀÉÍÓÚ etc., into a string to similar

0

I have to convert Latin chars like éáéíóúÀÉÍÓÚ etc., into a string to similar ones without special accents or wired symbols:

é -> e
è -> e
Ä -> A

I have a file named “test.rb”:

require 'iconv'

puts Iconv.iconv("ASCII//translit", "utf-8", 'è').join

When I paste those lines into irb it works, returning “e” as expected.

Running:

$ ruby test.rb

I get “?” as output.

I’m using irb 0.9.5(05/04/13) and Ruby 1.8.7 (2011-06-30 patchlevel 352) [i386-linux].

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-27T09:19:29+00:00

Ruby 1.8.7 was not multibyte character savvy like 1.9+ is. In general, it treats a string as a series of bytes, rather than characters. If you need better handling of such characters, consider upgrading to 1.9+.

James Gray has a series of articles about dealing with multibyte characters in Ruby 1.8. I highly recommend taking the time to read through them. It’s a complex subject so you’ll want to read the entire series he wrote a couple times.

Also, 1.8 encoding support needs the $KCODE flag set:

$KCODE = "U"

so you’ll need to add that to code running in 1.8.

Here is a bit of sample code:

#encoding: UTF-8

require 'rubygems'
require 'iconv'

chars = "éáéíóúÀÉÍÓÚ"

puts Iconv.iconv("ASCII//translit", "utf-8", chars)

puts chars.split('')
puts chars.split('').join

Using ruby 1.8.7 (2011-06-30 patchlevel 352) [x86_64-darwin10.7.0] and running it in IRB, I get:

1.8.7 :001 > #encoding: UTF-8
1.8.7 :002 >   
1.8.7 :003 >   require 'iconv'
true
1.8.7 :004 > 
1.8.7 :005 >   chars = "\303\251\303\241\303\251\303\255\303\263\303\272\303\200\303\211\303\215\303\223\303\232"
"\303\251\303\241\303\251\303\255\303\263\303\272\303\200\303\211\303\215\303\223\303\232"
1.8.7 :006 > 
1.8.7 :007 >   puts Iconv.iconv("ASCII//translit", "utf-8", chars)
'e'a'e'i'o'u`A'E'I'O'U
nil
1.8.7 :008 > 
1.8.7 :009 >   puts chars.split('')
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
nil
1.8.7 :010 > puts chars.split('').join
éáéíóúÀÉÍÓÚ

At line 9 in the output I told Ruby to split the line into its concept of characters, which in 1.8.7, was bytes. The resulting ‘?’ mean it didn’t know what to do with the output. A line 10 I told it to split, which resulted in an array of bytes, which join then reassembled into the normal string, allowing the multibyte characters to be translated normally.

Running the same code using Ruby 1.9.2 shows better, and more expected and desirable, behavior:

1.9.2p290 :001 > #encoding: UTF-8
1.9.2p290 :002 >   
1.9.2p290 :003 >   require 'iconv'
true
1.9.2p290 :004 > 
1.9.2p290 :005 >   chars = "éáéíóúÀÉÍÓÚ"
"éáéíóúÀÉÍÓÚ"
1.9.2p290 :006 > 
1.9.2p290 :007 >   puts Iconv.iconv("ASCII//translit", "utf-8", chars)
'e'a'e'i'o'u`A'E'I'O'U
nil
1.9.2p290 :008 > 
1.9.2p290 :009 >   puts chars.split('')
é
á
é
í
ó
ú
À
É
Í
Ó
Ú
nil
1.9.2p290 :010 > puts chars.split('').join
éáéíóúÀÉÍÓÚ

Ruby maintained the multibyte-ness of the characters, through the split('').

Notice that in both cases, Iconv.iconv did the right thing, it created characters that were visually similar to the input characters. While the leading apostrophe looks out of place, it’s there as a reminder the characters were accented originally.

For more information, see the links on the right to related questions or try this SO search for [ruby] [iconv]

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have to convert Latin chars like éáéíóúÀÉÍÓÚ etc., into a string to similar

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply