This code uses the Hpricot gem to get HTML that contains UTF-8 characters.
# <div>This is a test<a href="">测试</a></div>
div[0].to_html.gsub(/test/, "")
When that is run, it spits out this error (pointing at gsub):
ArgumentError (invalid byte sequence in UTF-8)
How can we fix this issue?
Figured out the issue. Hpricot’s
to_htmlcalls methods that trigger the error so to get rid of that we need to make the Hpricot document encoding UTF-8, not just that one string. We do that like this:And then we can call other Hpricot methods but now the whole document has UTF-8 encoding and it won’t give us any errors.