In my database, I have the following entry
id | name | info
1 John Smith Çö ¿¬¼
As you can tell, the info column displays wrong — it’s actually Korean, though.
In Chrome, when I switch the browser encoding from UTF-8 to Korean (‘euc-kr’, I think), I actually manage to view the text as such:
id | name | info
1 John Smith 횉철 쩔짭쩌
I then manually copy the text into the info in the database and save, and now I can view it in UTF-8, without switching my browser’s encoding.
Awesome. Now I’d like to get that same thing done in Rails, not manually. So starting with the original entry again, I go to the console and type:
require 'iconv'
u = User.find(1)
info = u.info
new_info = Iconv.iconv('euc-kr','UTF-8', info)
u.update_attribute('info', new_info)
However, what I end up with is something resembling \x{A2AF}\x{A8FA}\x{A1C6} \x{A2A5}\x{A8A2} in the database, not 횉철 쩔짭쩌.
I have a very basic understanding of unicode and encoding.
Can someone please explain what’s going on here and how to get around that?
The desired result is what I achieved manually.
Thanks!
Wow. I’m beating myself over the head now. After hours of trying to resolve this, I finally figured it out myself a few minutes after I posted a question here.
The solution consists of three simple steps:
STEP 1:
I almost had it right. I shouldn’t be converting from euc-kr to utf-8, but the other way around, as such:
STEP 2:
I might still run into some errors in the text, so to be safe I tell Iconv to ignore any errors:
Finally, I actually get REAL KOREAN TEXT, yay!
The problem is, when I try to insert it into the database, it’s still inserting something along the lines of:
Even though it turns out I have the right text. So why is that? Onto the last step.
STEP 3:
Turns out the output from Iconv is an array. And so, we merge it with
join:And this actually works!
The final code:
Hope this helps whomever sees this (and knowing myself, probably future me).