Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8580293
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 11, 20262026-06-11T20:47:52+00:00 2026-06-11T20:47:52+00:00

I get sources from the web and sometimes the encoding of the material is

  • 0

I get sources from the web and sometimes the encoding of the material is not 100% UTF8 byte sequence valid. I use iconv to silently ignore these sequences to get a cleaned string.

@iconv = Iconv.new('UTF-8//IGNORE', 'UTF-8')
valid_string = @iconv.iconv(untrusted_string)

However now the iconv has been deprecated, I see its deprecation warning a lot.

iconv will be deprecated in the future, use String#encode

I tried the converting it, using String#encode‘s :invalid and :replace options, but it seems not to be working (i.e. the incorrect byte sequence has not been removed). What is the correct way to use String#encode for this?

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-11T20:47:53+00:00Added an answer on June 11, 2026 at 8:47 pm

    The question that Martijn linked to has what seem to be the two best ways to do that, but Martijn made an understandable but incorrect change when copying the second approach to his answer here. Doing .encode(‘UTF-8’, <options>).encode(‘UTF-8’) doesn’t work. As indicated in the original answer in the other question, the key is to encode to a different encoding, then back to UTF-8. If your original string is already flagged as UTF-8 in ruby’s internals then ruby will ignore any call to encode it as UTF-8.

    In the following examples I’m going to use “a#{0xFF.chr}b”.force_encoding(‘UTF-8’) to produce a string that ruby believes is UTF-8 but which contains invalid UTF-8 bytes.

    1.9.3p194 :019 > "a#{0xFF.chr}b".force_encoding('UTF-8')
     => "a\xFFb" 
    1.9.3p194 :020 > "#{0xFF.chr}".force_encoding('UTF-8').encoding
     => #<Encoding:UTF-8> 
    

    Note how encoding to UTF-8 does nothing:

    1.9.3p194 :016 > "a#{0xFF.chr}b".force_encoding('UTF-8').encode('UTF-8', :invalid => :replace, :replace => '').encode('UTF-8')
     => "a\xFFb" 
    

    But encoding to something else (UTF-16) and then back to UTF-8 cleans up the string:

    1.9.3p194 :017 > "a#{0xFF.chr}b".force_encoding('UTF-8').encode('UTF-16', :invalid => :replace, :replace => '').encode('UTF-8')
     => "ab" 
    
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

We're not doing real web development. We get our HTMLs from our designers, and
When inserting copy into an HTML document I get from sources such as word
I'm maintaining a script that can get its input from various sources, and works
How can I get source code from WebBrowser component? I want to get source
I know how to get the latest source from GIT which is version 7.1.2
In the Java source from http://download.java.net/jdk6/source/ I get a jar of size ~130mb. The
I'm trying to get zc.buildout to install Gunicorn from source. Using the following configuration:
I get an XML string from a certain source. I create a DOMDocument object
Can I get somehow(from a database or other source) a percentage of the phones
I'm using an XSLTprocessor script to get data from an external source. Now I

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.