I’m a Perl programmer with some nice scripts that go fetch HTTP pages (from

Question

0

Asked: May 17, 20262026-05-17T00:27:31+00:00 2026-05-17T00:27:31+00:00

I’m a Perl programmer with some nice scripts that go fetch HTTP pages (from

0

I’m a Perl programmer with some nice scripts that go fetch HTTP pages (from a text file-list of URLs) with cURL and save them to a folder.

However, the number of pages to get is in the tens of millions. Sometimes the script fails on number 170,000 and I have to start the script again manually. It automatically reads the URL and sees if there is a page downloaded and skips. But, with a few hundred thousand, it still takes a few hours to skip back up to where it left off. Obviously, this is not going to pan out in the end.

I’ve been told that instead of saving to a text file, which is hard to search and modify, I need to use a database. I don’t know much about databases, just messed around with MySQL on a school server a year ago. I just need the ability to add millions of rows and a few static columns, search/modify one quickly, and do this all locally on a lan (or a single computer if that’s difficult). And of course, I need to access this database using perl.

Where should I start? What do I need to download to get a server started on Windows? Which Perl modules should I use? (I’m using an ActiveState distro)

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-17T00:27:31+00:00

Editorial Team

2026-05-17T00:27:31+00:00Added an answer on May 17, 2026 at 12:27 am

Since you only need to search on one column, you may wish to consider a key/value store database like the Berkeley DB by using either BerkeleyDB or DB_File.

Generally, you can think of these key/value databases as being Perl hashes that operate from a disk rather than memory. Exact key look ups are very fast. Everything else requires scanning the whole dataset.

0

Reply
Share
Share

- Report

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’m a Perl programmer with some nice scripts that go fetch HTTP pages (from

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply