I want to automatically grab some content from a page.
I wonder if it is possible:
-
Run my own written JavaScript on the page after the page is loaded (I use FireFox. I don’t have the ability to change content of the page. I just want to run JS on my browser.). The script will use
getelementbyidor similar method to get the link to the next page -
Run a JavaScript to collect my interested content (some URLs) on that page and store those URLs in a local file
-
Go to next page (the next page will get really loaded with my browser, but I do not need to intervene at all) and repeat step 1 and step 2, until there is no next page.
The classic way to do this is to write a Perl script using LWP or PHP script using CURL, etc. But that is all server side. I wonder if I can do it client side.
I do something rather similar, actually.
By using GreaseMonkey, you can write a user-script that will interact with the pages however you need. You can get the next page link and scroll things as you like.
You can also store any data locally, within Firefox though some new functions called GM_getValue and GM_setValue.
I take the lazy way out. I just generate a long list of the URLs that I find when navigating the pages. I do a crude “document.write” method and I dump out my list of URLs as a batch file that rules on
wget.At that point I copy-and-paste the batch file then run it.
If you need to run this often enough that it should be automated, there used to be a way to turn GreaseMonkey scripts into Firefox extensions, that have access to more power.
Another option is currently AFAIK, Chrome only. You can collect whatever information you need and build a large file from it, then use the
downloadattribute of a link and come up with a single-click to save things.Update
I was going to share the full code for that I was doing, but it was so tied to a particular website that it wouldn’t have really helped — so I’ll go for a more “general” solution.
Warning, this code typed on the fly and may not be actually correct.