I’ve known for a while that Python likes to reuse strings in memory instead of having duplicates:
>>> a = "test"
>>> id(a)
36910184L
>>> b = "test"
>>> id(b)
36910184L
However, I recently discovered that the string returned from raw_input() does not follow that typical optimization pattern:
>>> a = "test"
>>> id(a)
36910184L
>>> c = raw_input()
test
>>> id(c)
45582816L
I curious why this is the case? Is there a technical reason?
To me it appears that python interns string literals, but strings which are created via some other process don’t get interned:
Of course,
raw_inputis creating new strings without using string literals, so it’s quite feasible to assume that it won’t have the sameid. There are (at least) two reasons why C-python interns strings — memory (you can save a bunch if you don’t store a whole bunch of copies of the same thing) and resolution for hash collisions. If 2 strings hash to the same value (in a dictionary lookup for instance), then python needs to check to make sure that both strings are equivalent. It can do a string compare if they’re not interned, but if they are interned, it only needs to do a pointer compare which is a bit more efficient.