I’ve written a web crawler that I’d like to be able to stop via

Question

0

Asked: May 15, 20262026-05-15T06:38:14+00:00 2026-05-15T06:38:14+00:00

I’ve written a web crawler that I’d like to be able to stop via

0

I’ve written a web crawler that I’d like to be able to stop via the keyboard. I don’t want the program to die when I interrupt it; it needs to flush its data to disk first. I also don’t want to catch KeyboardInterruptedException, because the persistent data could be in an inconsistent state.

My current solution is to define a signal handler that catches SIGINT and sets a flag; each iteration of the main loop checks this flag before processing the next url.

However, I’ve found that if the system happens to be executing socket.recv() when I send the interrupt, I get this:

^C
Interrupted; stopping...  // indicates my interrupt handler ran
Traceback (most recent call last):
  File "crawler_test.py", line 154, in <module>
    main()
  ...
  File "/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/socket.py", line 397, in readline
    data = recv(1)
socket.error: [Errno 4] Interrupted system call

and the process exits completely. Why does this happen? Is there a way I can prevent the interrupt from affecting the system call?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-15T06:38:15+00:00

socket.recv() calls the underlying POSIX-compliant recv function in the C layer, which, in turn, will return an error code EINTR when the process receives a SIGINT while waiting for incoming data in recv(). This error code can be used on the C side (if you were programming in C) to detect that recv() returned not because there is more data available on the socket but because the process received a SIGINT. Anyway, this error code is turned into an exception by Python, and since it is never caught, it terminates your application with the traceback you see. The solution is simply to catch socket.error, check the error code and if it is equal to errno.EINTR, ignore the exception silently. Something like this:

import errno

try:
    # do something
    result = conn.recv(bufsize)
except socket.error as (code, msg):
    if code != errno.EINTR:
        raise

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’ve written a web crawler that I’d like to be able to stop via

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply