I am using Ruby to open a URL and read its content. The content type of the file I am reading is ‘text/plain’.
The issue is that this contains some characters which I want to escape. For example, one of the characters that is coming up in the plain text is “\240” which is ASCII for a hyphen.
I am curious how this is being generated, because I don’t see a hyphen anywhere in the text. Yet it exists invisibly and “\240” shows up when I use puts to print the text in the console.
Second of all, how do I escape such instances of weird characters? Ideally, I want to escape all characters which are of the form “\[some number]”. I am using
"\240".gsub(Regexp.new("\\\d+"),"")
but it doesn’t seem to work.
Are there more traditional ways of sanitizing plain text content read from opening a URL?
After having a play with this, I found the following regular expression which does the trick for me: