I have been using a rake file for a number of months to read in data from a CSV file. I have recently tried to read in a new CSV file but keep getting the error “invalid byte sequence in UTF-8”. I have tried to manually work out where the problem is, but with little success. The csv file is just text and URLs, there were a few unusual characters initially (where the original text had fancy bulletpoints) but I have removed those and cannot find any additional anomalies.
Is there a way to get round this problem automatically and identify and remove the problem characters?
I’ve found a solution to discard all invalid utf8 bytes from a string :
(taken from this blog post)
Hope this helps.