I’m writing a vbscript to pull some data from a webpage, strip out a few key pieces of information and write those to a file.
At the moment my script to access the pages and save the file contents to a string is this:
Set WshShell = WScript.CreateObject("WScript.Shell")
Set http = CreateObject("Microsoft.XmlHttp")
'Load Webpage where address is URL
http.open "GET", URL, FALSE
http.send ""
'Assign webpage contents as a string to variable called Webpage
WEBPAGE = http.responseText
I need to save the content to a string so I can use a regular expression on it to pull out the content that I need.
This script works perfectly, EXCEPT for when the pages contain non-standard characters (such as é). When the page contains something like this, the script throws up an error and stops.
I’m guessing this is something to do with the encoding, but I can’t work out how to fix it. Can anyone point me in the right direction? Thanks guys
Edit
Thanks to the help here I realised I’ve asked the wrong question! It turns out I was downloading the content fine – the problem was, afterwards I was trying to edit it and write it out to a file, and the file was in the wrong format. I had this:
Set objTextFile = objFSO.OpenTextFile(OutputFile, 8, True,)
Changing it to this:
Set objTextFile = objFSO.OpenTextFile(OutputFile, 8, True, -1)
Seems to have fixed it. What a crazy world, eh? Thanks for the help.
You may need to set the correct header blocks before send
eg the following is an example only. You will need to find out what this is exactly for your website
EDIT:
What about this instead. It works ok here
The E acute in this case returns as string %C3%89 and you can force it to whatever character you choose if required.
EDIT2:
Just to add, if you’re doing this with VBScript you may find this method useful