The other answers are correct. Here is some code you…

Question

0

Asked: May 13, 20262026-05-13T06:17:53+00:00 2026-05-13T06:17:53+00:00

What is the easiest way to programmatically extract structured data from a bunch of

0

What is the easiest way to programmatically extract structured data from a bunch of web pages?

I am currently using an Adobe AIR program I have written to follow the links on one page and grab a section of data off of the subsequent pages. This actually works fine, and for programmers I think this(or other languages) provides a reasonable approach, to be written on a case by case basis. Maybe there is a specific language or library that allows a programmer to do this very quickly, and if so I would be interested in knowing what they are.

Also do any tools exist which would allow a non-programmer, like a customer support rep or someone in charge of data acquisition, to extract structured data from web pages without the need to do a bunch of copy and paste?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-13T06:17:54+00:00

If you do a search on Stackoverflow for WWW::Mechanize & pQuery you will see many examples using these Perl CPAN modules.

However because you have mentioned “non-programmer” then perhaps Web::Scraper CPAN module maybe more appropriate? Its more DSL like and so perhaps easier for “non-programmer” to pick up.

Here is an example from the documentation for retrieving tweets from Twitter:

use URI;
use Web::Scraper;

my $tweets = scraper {
    process "li.status", "tweets[]" => scraper {
        process ".entry-content",    body => 'TEXT';
        process ".entry-date",       when => 'TEXT';
        process 'a[rel="bookmark"]', link => '@href';
    };
};

my $res = $tweets->scrape( URI->new("http://twitter.com/miyagawa") );

for my $tweet (@{$res->{tweets}}) {
    print "$tweet->{body} $tweet->{when} (link: $tweet->{link})\n";
}

How to approach applying for a job at a company ...

How to handle personal stress caused by utterly incompetent and ...

What is a programmer’s life like?

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions