I’ve written a web crawler that I’d like to be able to stop via the keyboard. I don’t want the program to die when I interrupt it; it needs to flush its data to disk first. I also don’t want to catch KeyboardInterruptedException, because the persistent data could be in an inconsistent state.
My current solution is to define a signal handler that catches SIGINT and sets a flag; each iteration of the main loop checks this flag before processing the next url.
However, I’ve found that if the system happens to be executing socket.recv() when I send the interrupt, I get this:
^C
Interrupted; stopping... // indicates my interrupt handler ran
Traceback (most recent call last):
File "crawler_test.py", line 154, in <module>
main()
...
File "/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/socket.py", line 397, in readline
data = recv(1)
socket.error: [Errno 4] Interrupted system call
and the process exits completely. Why does this happen? Is there a way I can prevent the interrupt from affecting the system call?
socket.recv()calls the underlying POSIX-compliantrecvfunction in the C layer, which, in turn, will return an error codeEINTRwhen the process receives aSIGINTwhile waiting for incoming data inrecv(). This error code can be used on the C side (if you were programming in C) to detect thatrecv()returned not because there is more data available on the socket but because the process received aSIGINT. Anyway, this error code is turned into an exception by Python, and since it is never caught, it terminates your application with the traceback you see. The solution is simply to catchsocket.error, check the error code and if it is equal toerrno.EINTR, ignore the exception silently. Something like this: