Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 5956547
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 22, 20262026-05-22T18:16:20+00:00 2026-05-22T18:16:20+00:00

I am having trouble handling text files of tabulated data generated on a windows

  • 0

I am having trouble handling text files of tabulated data generated on a windows machine.
I’m working in Ruby 1.8. The following gives an error (“\000” (Iconv::InvalidCharacter)) when processing the SECOND line from the file. The first line is converted properly.

require 'iconv'
conv = Iconv.new("UTF-8//IGNORE","UTF-16")
infile = File.open(tabfile, "r")
while (line = infile.gets)
  line = conv.iconv(line.strip)  # FAILS HERE
  puts line
  # DO MORE STUFF HERE
end

The strange thing is that it reads and converts the first line in the file with no problem.
I have the //IGNORE flag in the Iconv constructor — I thought this was supposed to suppress this kind of error.

I’ve been going in circles for a while. Any advice would be highly appreciated.

Thanks!

EDIT:
hobbs solution fixes this. Thank you.
Simply change the code to:

require 'iconv'
conv = Iconv.new("UTF-8//IGNORE","UTF-16")
infile = File.open(tabfile, "r")
while (line = infile.gets("\x0a\x00"))
  line = conv.iconv(line.strip)  # NO LONGER FAILS HERE
  # DOES MORE STUFF HERE
end

Now I’ll just need to find a way to automatically determine which gets separator to use.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-22T18:16:21+00:00Added an answer on May 22, 2026 at 6:16 pm

    The error message is pretty vague, but I think it’s unhappy about the fact that it’s found an odd number of bytes on a line, since every character in UTF-16 is two (or occasionally four) bytes. And I think the reason for that is your use of gets— the lines in your file are separated by a UTF-16le newline, which is 0x0a 0x00, but gets is splitting on (and strip is removing) 0x0a only.

    To illustrate: suppose the file contains

    ab
    cd
    

    encoded in UTF-16le. That’s

    0x61 0x00 0x62 0x00 0x0a 0x00 0x63 0x00 0x64 0x00 0x0a 0x00
        a         b         \n        c         d         \n
    

    gets reads up to the first 0x0a, which strip removes, so the first line read is 0x61 0x00 0x62 0x00, which iconv happily accepts and encodes to UTF-8 as 0x61 0x62 — “ab”. gets then reads up to the next 0x0a, which strip again removes, so the second time line gets 0x00 0x63 0x00 0x64 0x00 and now everything is screwed up — we’re out of sync by one byte and there’s an odd number of bytes to convert, and iconv blows up because that’s incompatible with what you asked it to do.

    Absent an actual working file encoding/decoding layer, I think what you want is to change the gets separator from "\n" ("\x0a") to "\x0a\x00", abandon all use of strip since it’s not encoding-clean, and use print instead of puts so that you don’t add extra line-ends (since you’ll be converting the ones you’ve already got).

    If you’re working with windows files, a windows CRLF in UTF-16le is "\x0d\x00\x0a\x00".

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

A poorly-written back-end system we interface with is having trouble with handling the load
I having trouble in dividing the HTML frames. I have been using the following
I am having some trouble provding a Win32 tooltips control with dynamic text in
I am having trouble handling the selections in DataGridView . My grid view contains
I'm having trouble handling the scenario whereby an event is being raised to a
I'm having trouble handling IDs of my databse tables using OpenJPA and HSQLdb. I
I'm building a simple interpreter in python and I'm having trouble handling differing numbers
I have an application that's having some trouble handling multi-processor systems. It's not an
I am having trouble handling a MouseEvent.MOUSE_DOWN on an component with a Image for
I'm having trouble handling a JSON object that I'm getting back from an AJAX

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.