Last question for this day. I’m trying to find a way to parse the

Question

0

Asked: June 2, 20262026-06-02T10:44:33+00:00 2026-06-02T10:44:33+00:00

Last question for this day. I’m trying to find a way to parse the

0

Last question for this day. I’m trying to find a way to parse the content of the tables of this page : http://www7.pearsonvue.com/Dispatcher?application=VTCLocator&action=actStartApp&v=W2L&cid=445 in a var, for putting it in a Excel file.

No problem for putting data into excel after parsing it with BeautifulSoup.

But (there is always a “but”) the source code is quite strange, with an iframe inside.

#!/usr/bin/python
# -- coding: utf-8 --

import xlwt
import urllib2
import sys
import re
from bs4 import BeautifulSoup as soup
import urllib

print("TEST FOR PTE TESTS CENTERS")

url = 'http://www6.pearsonvue.com/Dispatcher?application=VTCLocator&action=actStartApp&v=W2L&cid=445'
values = {
        'sortColumn' : 2,
        'sortDirection' : 1,
        'distanceUnits' : 0,
        'proximitySearchLimit'  : 20,
        'countryCode'  : 'GBR', # WE TRY FOR NOW WITH A SPECIFIC COUNTRY

            }

user_agent = 'Mozilla/5 (Solaris 10) Gecko'
headers = { 'User-Agent' : user_agent }

data = urllib.urlencode(values)
req = urllib2.Request(url, data, headers)
response = urllib2.urlopen(req)
thePage = response.read()
the_page = soup(thePage)


result = the_page.find('frame', attrs={'name' : 'VTCLocatorPageFrame'})
print result # We have now the FRAME link in the result var

So please find above the source of the script i’m trying to get working.

After running the script, we have this in the result var :

If you have any idea, It might be VERY helpful 🙂

Thanks in advance and via python !

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-02T10:44:34+00:00

Sorry for the question which was not very clear. I’ve tried to find a solution and here is the script I use :

#!/usr/bin/python
# -- coding: utf-8 --

import xlwt
import urllib2
import sys
import re
from bs4 import BeautifulSoup as soup
import urllib
liste_countries = ['USA','AFG','ALA','ALB','DZA','ASM','AND','AGO','AIA','ATA','ATG','ARG','ARM','ABW','AUS','AUT','AZE','BHS','BHR','BGD','BRB','BLR','BEL','BLZ','BEN','BMU','BTN','BOL','BES','BIH','BWA','BVT','BRA','IOT','BRN','BGR','BFA','BDI','BDI','KHM','CMR','CAN','CPV','CYM','CAF','TCD','CHL','CHN','CXR','CCK','COL','COM','COG','COD','COK','CRI','CIV','HRV','CUW','CYP','CZE','DNK','DJI','DMA','DOM','ECU','EGY','SLV','GNQ','ERI','EST','ETH','FLK','FRO','FJI','FIN','FRA','GUF','PYF','ATF','GAB','GMB','GEO','DEU','GHA','GIB','GRC','GRL','GRD','GLP','GUM','GTM','GGY','GIN','GNB','GUY','HTI','HMD','HND','HKG','HUN','ISL','IND','IDN','IRN','IRQ','IRL','IMN','ISR','ITA','JAM','JPN','JEY','JOR','KAZ','KEN','KIR','PRK','KOR','KWT','KGZ','LAO','LVA','LBN','LSO','LBR','LBY','LIE','LTU','LUX','MAC','MKD','MDG','MWI','MYS','MDV','MLI','MLT','MHL','MTQ','MRT','MUS','MYT','MEX','FSM','MDA','MCO','MNG','MNE','MSR','MAR','MOZ','MMR','NAM','NRU','NPL','NLD','NCL','NZL','NIC','NER','NGA','NIU','NFK','MNP','NOR','OMN','PAK','PLW','PSE','PAN','PNG','PRY','PER','PHL','PCN','POL','PRT','PRI','QAT','REU','ROU','RUS','RWA','BLM','KNA','LCA','MAF','WSM','SMR','STP','SAU','SEN','SRB','SYC','SLE','SGP','SXM','SVK','SVN','SLB','SOM','ZAF','SGS','SSD','ESP','LKA','SHN','SPM','VCT','SDN','SUR','SJM','SWZ','SWE','CHE','TWN','TJK','TZA','THA','TLS','TKL','TON','TTO','TUN','TUR','TKM','TCA','TUV','UGA','UKR','ARE','GBR','URY','UMI','UZB','VUT','VAT','VEN','VNM','VGB','VIR','WLF','ESH','YEM','ZMB','ZWE']


name_doc_out = raw_input("What do you want for name for the Excel output document ? >>> ")
wb = xlwt.Workbook(encoding='utf-8')
ws = wb.add_sheet("PTE_TC")
x = 0
y = 0
numero = 0
total = len(liste_countries)
total_city = len(villes_us)
number_city = 0
for liste in liste_countries:
            if 0 == 1:
                        print("THIS IF IS JUST FOR TEST")
            else:
                        print("Fetching country number %s on %s" % (numero, total))
                        numero = numero + 1
                        url = 'http://www6.pearsonvue.com/Dispatcher?v=W2L&application=VTCLocator&HasXSes=Y&layerPath=ROOT.VTCLocator.SelTestCenterPage&wscid=199372577&layer=SelTestCenterPage&action=actDisplay&bfp=top.VTCLocatorPageFrame&bfpapp=top&wsid=1334887910891'
                        values = {
                                'sortColumn' : 2,
                                'sortDirection' : 1,
                                'distanceUnits' : 0,
                                'proximitySearchLimit'  : 20,
                                'countryCode'  : liste,

                                    }

                        user_agent = 'Mozilla/5 (Solaris 10) Gecko'
                        headers = { 'User-Agent' : user_agent }

                        data = urllib.urlencode(values)
                        req = urllib2.Request(url, data, headers)
                        response = urllib2.urlopen(req)
                        thePage = response.read()
                        the_page = soup(thePage)

                        #print the_page
                        tableau = the_page.find('table', attrs={'id' : 'apptable'})
                        print tableau
                        try:
                                    rows = tableau.findAll('tr')
                                    for tr in rows:
                                                cols = tr.findAll('td')
                                                # del / remove les td qui faut pas
                                                y = 0
                                                x = x + 1
                                                for td in cols:
                                                            print td.text
                                                            ws.write(x,y,td.text.strip())
                                                            wb.save("%s.xls" % name_doc_out)
                                                            y = y + 1
                        except (IndexError, AttributeError):
                                    pass

I think that the problem come from the URL I use. I guess the id are changing from on request to another one …
http://www6.pearsonvue.com/Dispatcher?v=W2L&application=VTCLocator&HasXSes=Y&layerPath=ROOT.VTCLocator.SelTestCenterPage&wscid=199372577&layer=SelTestCenterPage&action=actDisplay&bfp=top.VTCLocatorPageFrame&bfpapp=top&wsid=1334887910891

It has worked fine for an hour, and now not anymore ! 🙂

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

Last question for this day. I’m trying to find a way to parse the

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply