I’m trying to write a code to read the content of a web page, but I’m not sure of the used encoding in that page, so how can I write a generic code that returns the right string without the strange symbols?
The encoding might be (“UTF-8”, “windows-1256”, …).
I’ve tried to but the UTF-8 but when the page is encoded with the second mentioned encoding I’m having some strange symbols.
Here is the code I’m using:
HttpWebRequest request = (HttpWebRequest)WebRequest.Create("SOME-URL");
request.Method = "GET";
WebResponse response = request.GetResponse();
StreamReader streamReader = new StreamReader(response.GetResponseStream(), System.Text.Encoding.UTF8);
string content = streamReader.ReadToEnd();
And here is a link that causes the problem:
http://forum.khleeg.com/144828.html
You must examine the response text to check this field:
This chars will also get corretly decoded as they are ANSI.
According to data from this tag you should create your
Encodingobject by theGetEncodingmethod like this:Another way is to use the .ContentEncoding property of the HttpWebResponse:
Or the
.CharacterSetproperty: