I just upgraded from Ruby 1.8 to 1.9, and most of my text processing

Question

0

Asked: May 16, 20262026-05-16T20:28:22+00:00 2026-05-16T20:28:22+00:00

I just upgraded from Ruby 1.8 to 1.9, and most of my text processing

0

I just upgraded from Ruby 1.8 to 1.9, and most of my text processing scripts now fail with the error invalid byte sequence in UTF-8. I need to either strip out the invalid characters or specify that Ruby should use ASCII encoding instead (or whatever encoding the C stdio functions write, which is how the files were produced) — how would I go about doing either of those things?

Preferably the latter, because (as near as I can tell) there’s nothing wrong with the files on disk — if there are weird, invalid characters they don’t appear in my editor…

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-16T20:28:23+00:00

What’s your locale set to in the shell? In Linux-based systems you can check this by running the locale command and change it by e.g.

$ export LANG=en_US

My guess is that you are using locale settings which have UTF-8 encoding and this is causing Ruby to assume that the text files were created according to utf-8 encoding rules. You can see this by trying

$ LANG=en_GB ruby -e 'warn "foo".encoding.name'
US-ASCII
$ LANG=en_GB.UTF-8 ruby -e 'warn "foo".encoding.name'
UTF-8

For a more general treatment of how string encoding has changed in Ruby 1.9 I thoroughly recommend
http://blog.grayproductions.net/articles/ruby_19s_string

(code examples assume bash or similar shell – C-shell derivatives are different)

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I just upgraded from Ruby 1.8 to 1.9, and most of my text processing

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply