I’m trying to collect some data from some webpage using python (they don’t have an API). I’ve never done this before.
I think its ASP.NET (which I know very little about) or some library with form-helpers they’re using that makes it real complicated to recreate a request “manually” by just sending the same postdata with urllib. There’s all kinds of weird human-unfriendly post-data they expect – god knows what they mean (and the developers).
I tried removing these however and just keep the basic data but that breaks the request. For instance when I change page in the pagination there’s some kind of “hash-ish” string that changes too (no simple page=x query string was enough as you would expect).
So instead spending hours of trying to figure out how everything works I’m thinking there’s some library that can help me here. With an interface like a browser where I could start just give it a url and say what forms to fill in, what links to goto and it automatically handles cookies, hidden inputs, etc and then give me the html output.
I hope you understand what I’m looking for. Maybe it doesn’t exist, but I feel like it would be useful so It should exist.
Other ways to tackle this problem is also helpful.
Thanks
Look at Selenium WebDriver or ghost.py -like projects if you need browser -like behavior.