Internet video archive: http://api.internetvideoarchive.com/

Question

0

Asked: May 14, 20262026-05-14T23:38:13+00:00 2026-05-14T23:38:13+00:00

Was looking to write a little web crawler in python. I was starting to

0

Was looking to write a little web crawler in python. I was starting to investigate writing it as a multithreaded script, one pool of threads downloading and one pool processing results. Due to the GIL would it actually do simultaneous downloading? How does the GIL affect a web crawler? Would each thread pick some data off the socket, then move on to the next thread, let it pick some data off the socket, etc..?

Basically I’m asking is doing a multi-threaded crawler in python really going to buy me much performance vs single threaded?

thanks!

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-14T23:38:14+00:00

When it comes to crawling you might be better off using something event-based such as Twisted that uses non-blocking asynchronous socket operations to fetch and return data as it comes, rather than blocking on each one.

Asynchronous network operations can easily be and usually are single-threaded. Network I/O almost always has higher latency than that of CPU because you really have no idea how long a page is going to take to return, and this is where async shines because an async operation is much lighter weight than a thread.

Edit: Here is a simple example of how to use Twisted’s getPage to create a simple web crawler.

How to approach applying for a job at a company ...

How to handle personal stress caused by utterly incompetent and ...

What is a programmer’s life like?

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions