I have a pretty simple Rails question regarding encoding that I can’t find an answer to.
Environment:
Rails 2.3.2/Ruby1.8.6
I am not setting any encoding options within the Rails environment currently, have left everything to defaults.
If I read a String from disk from a text file – and send it via Rails render :text functionality using Apache/Phusion, what encoding should the client expect?
Thank you for any answers,
Since about Rails 1.2, Rails sets Ruby 1.8’s $KCODE magic variable to “UTF8”. It includes ActiveSupport::CoreExtensions::String::Multibyte to patch around issues with otherwise ambiguous per-character/per-byte operators. Your text file should be UTF-8, Ruby will pass it through and your application layout should specify a META tag declaring the document’s charset to be UTF-8 too:
Then it should all ‘just work’, but there are some gotchas described below.
If you’re on a Mac, running “script/console” in Terminal.app and then pasting unusual character sequences directly into the terminal from e.g. the Character Viewer is a good way to play around and demonstrate this to your own satisfaction, since the whole OS works in UTF-8. I don’t know what the equivalent would be for Windows or an arbitrary Linux distribution.
For example, “⇒” – RIGHTWARDS DOUBLE ARROW – is Unicode 21D2, UTF8 0xE2 (226), 0x87 (125), 0x92 (146). If I paste that into Terminal and ask for the byte values I get the expected result:
…but…
Note how you’re still getting byte access with “[]”. See the documentation on the Multibyte extensions in the Rails API (for Rails 2.2, e.g. at http://railsapi.com/) if you want to do string operations, otherwise things like “foo.reverse” will do the wrong thing; “foo.mb_chars.reverse” gets it right by using the “mb_chars” proxy.