I am writing a web crawler that processes multiple URLs at the same time

Question

0

Editorial Team

Asked: June 1, 20262026-06-01T05:58:46+00:00 2026-06-01T05:58:46+00:00

I am writing a web crawler that processes multiple URLs at the same time

0

I am writing a web crawler that processes multiple URLs at the same time and works in the following way:

It gets a URL from a list of URLs included in seed_list.txt,
It crawls it and write the data into data.txt;

just like how most of web crawlers work.

When I make it single-threaded, I can get the data in data.txt in the same order with that of the URLs in seed_list.txt, but when it’s multi-threaded, I don’t seem able to control it, as each thread writes the data to data.txt once it is finished.

Is there a way I can make my web crawler multi-threaded but keep the original order?

Thank you very much!

@Lance, Ignacio, and Maksym,

thank you all for your help – your answers definitely point me in the right direction.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-01T05:58:48+00:00

Editorial Team

2026-06-01T05:58:48+00:00Added an answer on June 1, 2026 at 5:58 am

You could create a class that has an index number of the line from seed_list.txt, the URL, and where the data from the web. An object of this type can be created with the line number and URL, then it is passed to the worker thread which will put the data into the object, and then the object is passed to a write thread which will order the objects by the line number and output the data as necessary.

0

Reply
Share
Share

- Report

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I am writing a web crawler that processes multiple URLs at the same time

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply