I can do it in vim like so:
:%s/\%u2013/-/g
How do I do the equivalent in Perl? I thought this would do it but it doesn’t seem to be working:
perl -i -pe 's/\x{2013}/-/g' my.dat
Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.
Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.
Lost your password? Please enter your email address. You will receive a link and will create a new password via email.
Please briefly explain why you feel this question should be reported.
Please briefly explain why you feel this answer should be reported.
Please briefly explain why you feel this user should be reported.
For a generic solution, Text::Unidecode transliterate pretty much anything that’s thrown at it into pure US-ASCII.
So in your case this would work:
The -C is there to make sure the input is read as utf8
It converts this:
into this:
The last one shows the limits of the module, which can’t infer the vowels and get as-salaamu `alaykum from the original arabic. It’s still pretty good I think