Can someone explain this to me? So I’ve been playing with the id() command in python and came across this:
>>> id('cat')
5181152
>>> a = 'cat'
>>> b = 'cat'
>>> id(a)
5181152
>>> id(b)
5181152
This makes some sense to me except for one part: The string ‘cat’ has an address in memory before I assign it to a variable. I probably just don’t understand how memory addressing works but can someone explain this to me or at least tell me that I should read up on memory addressing?
So that is all well and good but this confused me further:
>>> a = a[0:2]+'t'
>>> a
'cat'
>>> id(a)
39964224
>>> id('cat')
5181152
This struck me as weird because ‘cat’ is a string with an address of 5181152 but the new a has a different address. So if there are two ‘cat’ strings in memory why aren’t two addresses printed for id(‘cat’)? My last thought was that the concatenation had something to do with the change in address so I tried this:
>>> id(b[0:2]+'t')
39921024
>>> b = b[0:2]+'t'
>>> b
'cat'
>>> id(b)
40000896
I would have predicted the IDs to be the same but that was not the case. Thoughts?
Python reuses string literals fairly aggressively. The rules by which it does so are implementation-dependent, but CPython uses two that I’m aware of:
"cat", it always refers to the same string object.def foo(): return "pack my box with five dozen liquor jugs"
def bar(): return "pack my box with five dozen liquor jugs"
assert foo() is bar() # AssertionError
Both optimizations are done at compile time (that is, when the bytecode is generated).
On the other hand, something like
chr(99) + chr(97) + chr(116)is a string expression that evaluates to the string"cat". In a dynamic language like Python, its value can’t be known at compile time (chr()is a built-in function, but you might have reassigned it) so it normally isn’t interned. Thus itsid()is different from that of"cat". However, you can force a string to be interned using theintern()function. Thus:As others have mentioned, interning is possible because strings are immutable. It isn’t possible to change
"cat"to"dog", in other words. You have to generate a new string object, which means that there’s no danger that other names pointing to the same string will be affected.Just as an aside, Python also converts expressions containing only constants (like
"c" + "a" + "t") to constants at compile time, as the below disassembly shows. These will be optimized to point to identical string objects per the rules above.