I’m reading “The Ruby Programming Language”. In section 3.2.6.1, “Multibyte characters in Ruby 1.9”, the book introduces an optimization in Ruby’s string
If a string literal contains only 7-bit ASCII characters, then its encoding method will return ASCII, even if the source encoding is UTF-8
I tried the following simple script on both ruby 1.9.1-p431, 1.9.2 and 1.9.3-p125, both uses UTF-8 encoding for 7-bit ASCII characters.
# coding: utf-8
s = 'hello'
p s.encoding
# result is #<Encoding:UTF-8>
I guess maybe this behavior is changed during the development of Ruby 1.9. I tried to search Ruby 1.9’s changelog, and the 1.9.1 changelog confirms this behavior. I also cloned Ruby’s git repository but I can’t find the commit mentioning about changing this behavior.
Update:
Looking at Ruby’s source code repository, I guess this is the behavior in Ruby 1.9.0, which was released in Jan, 2008. (It failed to compile on Debian 6 so I can’t exactly confirm this.) Though “The Ruby Programming Language” is an excellent book, it’s originally published in 2008. It’s very likely that some descriptions in the book are outdated.
Another outdated description is about the Encoding.list method behavior. So be careful of outdated description if you are also reading this book.
I don’t have that book, but The current Pdf version of the Programming Ruby book (the pickaxe) states
And then gives an example where
"dog"gains the utf-8 encoding. Looks like the edition of the book you have is wrong. Whether that was an errata in the print version of your book or just the fact that ruby changed after it was printed, I don’t know