I have a script that records files with UTF8 encoded names. However the script’s encoding / environment wasn’t set up right, and it just recoded the raw bytes. I now have lots of lines in the file like this:
.../My\ Folders/My\ r\303\266m/...
So there are spaces in the filenames with \ and UTF8 encoded stuff like \303\266 (which is ö). I want to reverse this encoding? Is there some easy set of bash command line commands I can chain together to remove them?
I could get millions of sed commands but that’d take ages to list all the non-ASCII characters we have. Or start parsing it in python. But I’m hoping there’s some trick I can do.
In the end I used something like this:
Some of the files had
%in them, which is a printf special character, so I had to ‘double it up’ so that it would be escaped and passed straight through. The-rinreadstops read escaping the\‘s however read doesn’t turn"\ "into" ", so I needed the finalsed.