How do I translate or strip character sequences like “\xC2\xBB” in my strings in Ruby 1.9.2?
Share
Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.
Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.
Lost your password? Please enter your email address. You will receive a link and will create a new password via email.
Please briefly explain why you feel this question should be reported.
Please briefly explain why you feel this answer should be reported.
Please briefly explain why you feel this user should be reported.
You will usually see hex bytes like that when the string is using an encoding that does not handle those bytes. If you know what encoding the string is supposed to be using, you can use
String#force_encodingto re-interpret the bytes according to your desired encoding.Both result in the same UTF-8 encoded string internally. When under the C locale, Ruby prints an escaped version to avoid printing binary data to the terminal (which, according to the locale setting, might not support it).
If the string is already using the appropriate encoding, then you should re-encode the string to your desired output encoding before using it:
Above, I use
String#force_encodingto make sure the bytes in the string are are flagged as ISO 8859-1 (because, for instance, a header accompanying the bytes said that they represented an ISO 8859-1 encoded string) and then useString#encodere-encode it as UTF-8 (the desired output encoding).Finally, if you really just want to strip out anything that is not ASCII, you could use the negated
[:ascii:]character class withString#gsub: