I wish to download a web page, which may be in any possible text encoding, and save it as UTF16LE. Assuming I can determine the text’s encoding (by examining the HTTP header, HTML header, and/or BOM), how do I convert the text?
I am using Delphi 2009. Unfortunately, the help files do not explain how to get from any encoding to a Unicode (UTF16LE) string. Specific questions:
- Can I complete the conversion, simply by setting the correct encoding on an AnsiString and assigning that to a UnicodeString?
- If so, how do I translate the various ‘charset’ descriptions that may label the web page (Big5, Shift-JIS, UTF-32, etc) into the right format to initialize the AnsiString?
Thanks for your suggestions.
I have a preference for straight Win32 and VCL, but answers involving ActiveX controls may also be helpful.
how are you going to access the page? Embedded Internet Explorer, INDY, third party tool, …? That might influence the answer because it determines the format of the input string.
Part 1: Getting the page
If you use the Embedded Internet Explorer (
TWebBrowser) to access the page things are pretty straightforward:The encoding of the web page should be handled properly by the IE and by Delphi and you end up with a
UnicodeStringcontaining the result (myTextin the examples).Part 2: Saving in UTF-16LE
Regardless where your string came from – you can save it like this in the desired encoding:
TEncoding.Unicodeis UTF-16LE, but you could also use any other encoding.Hope this helps.