Using the following code, I can download the HTML of a file from the internet:
WebClient wc = new WebClient();
// ....
string downloadedFile = wc.DownloadString("http://www.myurl.com/");
However, sometimes the file contains “interesting” characters like é to é, ← to ↠and フシギダネ to フシギダãƒ.
I think it may be something to do with different unicode types or something, as each character gets changed into 2 new ones, perhaps each character being split in half but I have very little knowledge in this area. What do you think is wrong?
Here’s a wrapped download class which supports gzip and checks encoding header and meta tags in order to decode it correctly.
Instantiate the class, and call
GetPage().