I have been trying to track down weird problems with my mod_wsgi/Python web application. I have the application handler which creates an object and calls a method:
def my_method(self, file):
self.sapi.write("In my method for %d time"%self.mmcount)
self.mmcount += 1
# ... open file (absolute path to file), extract list of files inside
# ... exit if file contains no path/file strings
for f in extracted_files:
self.num_files_found += 1
self.my_method(f)
At the start and end of this, I write
obj.num_files_found
To the browser.
So this is a recursive function that goes down a tree of file-references inside files. Any references in a file are printed and then those references are opened and examined and so on until all files are leaf-nodes containing no files. Why I am doing this isn’t really important … it is more of a pedantic example.
You would expect the output to be deterministic
Such as
Files found: 0
In my method for the 0 time
In my method for the 1 time
In my method for the 2 time
In my method for the 3 time
...
In my method for the n time
Files found: 128
And for the first few requests it is as expected. Then I get the following for as long as I refresh
Files found: 0
In my method for the 0 time
Files found: 128
Even though I know, from previous refreshes and no code/file alterations that it takes n times to enumerate 128 files.
So the question then: Does mod_wsgi/Python include internal optimizations that would stop complete execution? Does it guess the output is deterministic and cache?
As a note, in the refreshes when it is as expected, REMOTE_PORT increments by one each time … when it uses a short output, the increment of REMOTE_PORT jumps wildly. Might be unrelated however.
I am new to Python, be gentle
Solved
Who knows what it was, but ripping out Apache, mod_python, mod_wsgi and nearly everything HTTP related and re-installing fixed the problem. Something was pretty broken but seems ok now 🙂
“Does mod_wsgi/Python include internal optimizations that would stop complete execution? Does it guess the output is deterministic and cache?”
No.
The problem is (generally) that you have a global variable somewhere in your program that is not getting reset the way you hoped it would.
Sometimes this can be unintentional, since Python checks local namespace and global namespace for variables.
You can — inadvertently — have a function that depends on some global variable. I’d bet on this.
What you’re likely seeing is a number of mod_wsgi daemon processes, each with a global variable problem. The first request for each daemon works. Then your global variable is in a state that prevents work from happening. [File is left open, top-level directory variable got overwritten, who knows?]
After the first few, all the daemons are stuck in the “other” mode where they report the answer without doing the real work.