I have a question about inner-workings of Python sub-interpreter initialization (from Python/C API) and Python id() function. More precisely, about handling of global module objects in a WSGI Python containers (like uWSGI used with nginx and mod_wsgi on Apache).
The following code works as expected (isolated) in both of the mentioned environments, but I can not explain to my self why the id() function always returns the same value per variable, regardless of the process/sub-interpreter in which it is executed.
from __future__ import print_function
import os, sys
def log(*msg):
print(">>>", *msg, file=sys.stderr)
class A:
def __init__(self, x):
self.x = x
def __str__(self):
return self.x
def set(self, x):
self.x = x
a = A("one")
log("class instantiated.")
def application(environ, start_response):
output = "pid = %d\n" % os.getpid()
output += "id(A) = %d\n" % id(A)
output += "id(a) = %d\n" % id(a)
output += "str(a) = %s\n\n" % a
a.set("two")
status = "200 OK"
response_headers = [
('Content-type', 'text/plain'), ('Content-Length', str(len(output)))
]
start_response(status, response_headers)
return [output]
I have tested this code in uWSGI with one master process and 2 workers; and in mod_wsgi using a deamon mode with two processes and one thread per process. The typical output is:
pid = 15278
id(A) = 139748093678128
id(a) = 139748093962360
str(a) = one
on first load, then:
pid = 15282
id(A) = 139748093678128
id(a) = 139748093962360
str(a) = one
on second, and then
pid = 15278 | pid = 15282
id(A) = 139748093678128
id(a) = 139748093962360
str(a) = two
on every other. As you can see, id() (memory location) of both the class and the class instance remains the same in both processes (first/second load above), while at the same time class instances live in a separate context (otherwise the second request would show “two” instead of “one”)!
I suspect the answer might be hinted by Python docs:
id(object):Return the “identity” of an object. This is an integer (or long integer) which
is guaranteed to be unique and constant for this object during its lifetime. Two
objects with non-overlapping lifetimes may have the sameid()value.
But if that indeed is the reason, I’m troubled by the next statement that claims the id() value is object’s address!
While I appreciate the fact this could very well be just a Python/C API “clever” feature that solves (or rather fixes) a problem of caching object references (pointers) in 3rd party extension modules, I still find this behavior to be inconsistent with… well, common sense. Could someone please explain this?
I’ve also noticed mod_wsgi imports the module in each process (i.e. twice), while uWSGI is importing the module only once for both processes. Since the uWSGI master process does the importing, I suppose it seeds the children with copies of that context. Both workers work independently afterwards (deep copy?), while at the same time using the same object addresses, seemingly. (Also, a worker gets reinitialized to the original context upon reload.)
I apologize for such a long post, but I wanted to give enough details.
Thank you!
It’s not entirely clear what you’re asking; I’d give a more concise answer if the question was more specific.
First, the id of an object is, in fact–at least in CPython–its address in memory. That’s perfectly normal: two objects in the same process at the same time can’t share an address, and an object’s address never changes in CPython, so the address works neatly as an id. I don’t know how this violates common sense.
Next, note that a backend process may be spawned in two very distinct ways:
However, the end result in both of these cases is identical: separate, forked processes.
So, why are you ending up with the same IDs? That depends on which of the above methods is in use.
However, either way, once the fork happens they’re separate objects, in separate contexts. There’s no significance to objects in separate processes having the same ID.