I’m parsing html using HTML Agility Pack and from time to time I get weird looking strings like:”–”. What is the simplest way to remove them ? By the way, I’m using C#.
Share
Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.
Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.
Lost your password? Please enter your email address. You will receive a link and will create a new password via email.
Please briefly explain why you feel this question should be reported.
Please briefly explain why you feel this answer should be reported.
Please briefly explain why you feel this user should be reported.
You probably need to look into why you are getting those characters in the first place, and it will likely be something is wrong with the encoding
But if you do need to remove all the non-ascii characters from a string, the regex [^ -~] does the trick
see http://www.catonmat.net/blog/my-favorite-regex/ for the explanation of why that regex works