My application allows the user to enter text. When they copy and paste from MS Word, it pastes smart quotes, smart apostrophes and ellipsis. These characters get saved into the database and cause problems. What is the best way to replace these non-UTF-8 characters with normal quotes(“), apostrophe(‘) and periods(…)?
Also, how do you test this functionality? I added a test with these special characters and # encoding: ISO-8859-1 at the top of the file. The special characters caused the tests stop running: /home/george/.rvm/gems/ruby-1.9.2-p180/gems/redgreen-1.2.2/lib/redgreen.rb:62:in 'sub': invalid byte sequence in UTF-8 (ArgumentError)…Apparently redgreen gem is incompatible with these characters…?
Thanks.
you can add a before_save method that will convert your text to UTF-8 corresponding characters. if you have just 1 field that might contain non-UTF8 chars then its simple, if you have many fields then it would be better if you dynamically iterate over changed text/string fields and fix UTF-8 problem. Either way you need to use String#encode. Here is an example
And for bonus points you can also check if the field was changed using rails handy changed? helpers before fixing it.