I’ve got a website that I’d like to pull data from and it’s really

Question

0

Asked: May 23, 20262026-05-23T15:11:52+00:00 2026-05-23T15:11:52+00:00

I’ve got a website that I’d like to pull data from and it’s really

0

I’ve got a website that I’d like to pull data from and it’s really stuck in the stone ages. There’s no web service, no API and it’s very much an ASP/Session/table-based-layout page. Pretty fugly.

I’d like to just screen scrape it and use js (coffeescript) to automate that. I wonder if this is possible. I could do this with C# and linqpad but then I’m stuck parsing the tables (and sub-tables and sub-sub-tables) with regex. Plus if I do it with js or coffeescript I’ll get much more comfortable with those languages and I’ll be able to use jQuery for pulling elements out of the DOM.

I see two possibilities here:

use C# and find a library that will do things like Jquery but in C# code
use coffeescript (js) and use jquery to find the elements that I’m looking for in the page

I’d also like to automate the page a bit (get next set of results). This is strictly for personal use — I’m not pulling results of someone’s search to use in my business. I just want to make a crappy search engine do what I want.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-23T15:11:53+00:00

I wrote a class that allows you to supply a bunch of urls and a code block to scrape pages inside a chrome extension. You can find the github repo here: https://github.com/jkarmel/Executor. It could use some more testing and I need to work on the documentation, but it looks like it might be what you are looking for.

Here is how you would use it to get the all the links from a few different pages:

/*
* background.js by Jeremy Karmel. 
*/

URLS = ['http://www.apple.com/',
        'http://www.google.com/',
        'http://www.facebook.com/',
        'http://www.stanford.edu'];

//Function will be provided to exector to collect information
var getLinks = function() {
    var links = [];
    var numLinks = $('a');
    $links.each(function(i, val) {links.push(val.href)});
    var request = {data: links, url: window.location.href};
    chrome.extension.sendRequest(request);
}

var main = function() {
    var specForUsersTopics = {
        urls     : URLS,
        code     : getLinks,

        callback : function(results) {
            for (var url in results) {
                console.log(url + ' has ' + results[url].length + ' links.');
                var links = results[url];
                for (var i = 0; i < links.length; i++) 
                    console.log('   ' + links[i]);
            }
            console.log('all done!!!!');
        }
    };
    var exec = Executor(specForUsersTopics);
    exec.start();
}

main();

So basically the code to collect the links would be supplied to the executor instance and then you would do whatever you wanted with the results in the callback. It can deal with longish lists of url (~1000) and it will work on more than one at a time (default == 5). It doesn’t handle errors in the code block very well right now, so be sure to test the code you are supplying.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’ve got a website that I’d like to pull data from and it’s really

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply