Last question for this day. I’m trying to find a way to parse the content of the tables of this page : http://www7.pearsonvue.com/Dispatcher?application=VTCLocator&action=actStartApp&v=W2L&cid=445 in a var, for putting it in a Excel file.
No problem for putting data into excel after parsing it with BeautifulSoup.
But (there is always a “but”) the source code is quite strange, with an iframe inside.
#!/usr/bin/python
# -- coding: utf-8 --
import xlwt
import urllib2
import sys
import re
from bs4 import BeautifulSoup as soup
import urllib
print("TEST FOR PTE TESTS CENTERS")
url = 'http://www6.pearsonvue.com/Dispatcher?application=VTCLocator&action=actStartApp&v=W2L&cid=445'
values = {
'sortColumn' : 2,
'sortDirection' : 1,
'distanceUnits' : 0,
'proximitySearchLimit' : 20,
'countryCode' : 'GBR', # WE TRY FOR NOW WITH A SPECIFIC COUNTRY
}
user_agent = 'Mozilla/5 (Solaris 10) Gecko'
headers = { 'User-Agent' : user_agent }
data = urllib.urlencode(values)
req = urllib2.Request(url, data, headers)
response = urllib2.urlopen(req)
thePage = response.read()
the_page = soup(thePage)
result = the_page.find('frame', attrs={'name' : 'VTCLocatorPageFrame'})
print result # We have now the FRAME link in the result var
So please find above the source of the script i’m trying to get working.
After running the script, we have this in the result var :
If you have any idea, It might be VERY helpful 🙂
Thanks in advance and via python !
Sorry for the question which was not very clear. I’ve tried to find a solution and here is the script I use :
I think that the problem come from the URL I use. I guess the id are changing from on request to another one …
http://www6.pearsonvue.com/Dispatcher?v=W2L&application=VTCLocator&HasXSes=Y&layerPath=ROOT.VTCLocator.SelTestCenterPage&wscid=199372577&layer=SelTestCenterPage&action=actDisplay&bfp=top.VTCLocatorPageFrame&bfpapp=top&wsid=1334887910891
It has worked fine for an hour, and now not anymore ! 🙂