I have a web page below to explorer, as you can see, it is about trading forex and the website list all live trades records here:
http://www.forexfactory.com/trades.php?reset=1
I usually use python to read the source code behind and parse the information by BeautifulSoup. However, in this case, as you can see, here is a “more” clickable button at the end of pane:
a busy cat http://i.minus.com/ibfq5BgLjta0Lo.jpg
If I click it one times, the list of trades will be extended once and at the end of the list here is more to be clicked again. After clicking two or three times, the whole list will be shown completely. How could I let the python to click the more in the fashion of programming code then I could fetch the whole list of the trade records?
Following question as well: Usually, we could read the HTML source and use some parsing technology to get the text information out of complex tags. However, if you do not go to the source code but use your mouse to select whole content of the web then hit “ctrl + c” then you get the all text shown in your browser without complex tags. I thought it might be also another way to fetch information. However, it seems the python only could read the HTML source and is there any way like what I described to simple select the whole content of web then copy then we get a long string including the whole text information without tags?
Thanks lot gurus!!!
Basically on clicking more an
X-Requested-With: XMLHttpRequestheader is set. You can also see it using firefox’slive http headeraddon. It means an ajax request is being made.So, basically you have two choices:
1) Observe the url pattern’s on clicking
moreand use them in your code.2) You may be intested in
python-spidermonkey module, which aims at executing javascript from python.and you can also use Selenium. It’s a library that allows you to control a real web browser from your language of choice.