Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 6914261
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 27, 20262026-05-27T09:19:29+00:00 2026-05-27T09:19:29+00:00

I have to convert Latin chars like éáéíóúÀÉÍÓÚ etc., into a string to similar

  • 0

I have to convert Latin chars like éáéíóúÀÉÍÓÚ etc., into a string to similar ones without special accents or wired symbols:

é -> e
è -> e
Ä -> A

I have a file named “test.rb”:

require 'iconv'

puts Iconv.iconv("ASCII//translit", "utf-8", 'è').join

When I paste those lines into irb it works, returning “e” as expected.

Running:

$ ruby test.rb

I get “?” as output.

I’m using irb 0.9.5(05/04/13) and Ruby 1.8.7 (2011-06-30 patchlevel 352) [i386-linux].

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-27T09:19:29+00:00Added an answer on May 27, 2026 at 9:19 am

    Ruby 1.8.7 was not multibyte character savvy like 1.9+ is. In general, it treats a string as a series of bytes, rather than characters. If you need better handling of such characters, consider upgrading to 1.9+.

    James Gray has a series of articles about dealing with multibyte characters in Ruby 1.8. I highly recommend taking the time to read through them. It’s a complex subject so you’ll want to read the entire series he wrote a couple times.

    Also, 1.8 encoding support needs the $KCODE flag set:

    $KCODE = "U"
    

    so you’ll need to add that to code running in 1.8.

    Here is a bit of sample code:

    #encoding: UTF-8
    
    require 'rubygems'
    require 'iconv'
    
    chars = "éáéíóúÀÉÍÓÚ"
    
    puts Iconv.iconv("ASCII//translit", "utf-8", chars)
    
    puts chars.split('')
    puts chars.split('').join
    

    Using ruby 1.8.7 (2011-06-30 patchlevel 352) [x86_64-darwin10.7.0] and running it in IRB, I get:

    1.8.7 :001 > #encoding: UTF-8
    1.8.7 :002 >   
    1.8.7 :003 >   require 'iconv'
    true
    1.8.7 :004 > 
    1.8.7 :005 >   chars = "\303\251\303\241\303\251\303\255\303\263\303\272\303\200\303\211\303\215\303\223\303\232"
    "\303\251\303\241\303\251\303\255\303\263\303\272\303\200\303\211\303\215\303\223\303\232"
    1.8.7 :006 > 
    1.8.7 :007 >   puts Iconv.iconv("ASCII//translit", "utf-8", chars)
    'e'a'e'i'o'u`A'E'I'O'U
    nil
    1.8.7 :008 > 
    1.8.7 :009 >   puts chars.split('')
    ?
    ?
    ?
    ?
    ?
    ?
    ?
    ?
    ?
    ?
    ?
    ?
    ?
    ?
    ?
    ?
    ?
    ?
    ?
    ?
    ?
    ?
    nil
    1.8.7 :010 > puts chars.split('').join
    éáéíóúÀÉÍÓÚ
    

    At line 9 in the output I told Ruby to split the line into its concept of characters, which in 1.8.7, was bytes. The resulting ‘?’ mean it didn’t know what to do with the output. A line 10 I told it to split, which resulted in an array of bytes, which join then reassembled into the normal string, allowing the multibyte characters to be translated normally.

    Running the same code using Ruby 1.9.2 shows better, and more expected and desirable, behavior:

    1.9.2p290 :001 > #encoding: UTF-8
    1.9.2p290 :002 >   
    1.9.2p290 :003 >   require 'iconv'
    true
    1.9.2p290 :004 > 
    1.9.2p290 :005 >   chars = "éáéíóúÀÉÍÓÚ"
    "éáéíóúÀÉÍÓÚ"
    1.9.2p290 :006 > 
    1.9.2p290 :007 >   puts Iconv.iconv("ASCII//translit", "utf-8", chars)
    'e'a'e'i'o'u`A'E'I'O'U
    nil
    1.9.2p290 :008 > 
    1.9.2p290 :009 >   puts chars.split('')
    é
    á
    é
    í
    ó
    ú
    À
    É
    Í
    Ó
    Ú
    nil
    1.9.2p290 :010 > puts chars.split('').join
    éáéíóúÀÉÍÓÚ
    

    Ruby maintained the multibyte-ness of the characters, through the split('').

    Notice that in both cases, Iconv.iconv did the right thing, it created characters that were visually similar to the input characters. While the leading apostrophe looks out of place, it’s there as a reminder the characters were accented originally.

    For more information, see the links on the right to related questions or try this SO search for [ruby] [iconv]

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have to convert an incoming String field into a BigDecimal field that would
I have a UTF8 string that contains non-English symbols. I need to convert them
I have this function that converts all special chars to uppercase: function uc_latin1($str) {
I have to convert a generic object(s) into a NameValueCollection. I am attempting to
I often have to convert a retreived value (usually as a string) - and
I have a date like this:- 20091023 i have to convert it to a
Is it posible to convert Cyrillic string to English(Latin) in c#? For example I
How can I convert decomposed unicode character sequences like LATIN SMALL LETTER E +
I can have Spring convert my json POST submission into an object with a
I have to convert all the latin characters to their corresponding English alphabets. Can

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.