Basically, i have a list of 30,000 URLs.
The script goes through the URLs and downloads them (with a 3 second delay in between).
And then it stores the HTML in a database.
And it loops and loops…
Why does it randomly get “Killed.”? I didn’t touch anything.
Edit: this happens on 3 of my linux machines.
The machines are on a Rackspace cloud with 256 MB memory. Nothing else is running.
Looks like you might be running out of memory — might easily happen on a long-running program if you have a “leak” (e.g., due to accumulating circular references). Does Rackspace offer any easily usable tools to keep track of a process’s memory, so you can confirm if this is the case? Otherwise, this kind of thing is not hard to monitor with normal Linux tools from outside the process. Once you have determined that “out of memory” is the likely cause of death, Python-specific tools such as pympler can help you track exactly where the problem is coming from (and thus determine how to avoid those references — be it by changing them to weak references, or other simpler approaches — or otherwise remove the leaks).