I am using yield to create a generator that returns chunks of a string that are being extracted using a regex and re.sub(). While I found an approach that worked, I am a bit confused about why it works one way but not another, as shown below:
This doesn’t work (processchunk() is not assigning to the chunk declared in splitmsg):
def splitmsg(msg):
chunk = None
def processchunk(match):
chunk = match.group(1)
return ""
while True:
chunk = None
msg = re.sub(reCHUNK,processchunk,msg,1)
if chunk:
yield chunk
else:
break
This does work (note the only difference being chunk is now a list chunks):
def splitmsg(msg):
chunks = [ None, ]
def processchunk(match):
chunks[0] = match.group(1)
return ""
while True:
chunks[0] = None
msg = re.sub(reCHUNK,processchunk,msg,1)
if chunks[0]:
yield chunks[0]
else:
break
My question is basically why does it appear that the scoping of the chunk/chunks variable seem to depend on whether it is a plain variable or a list?
In python, variables can be ‘pulled’ from the surrounding scope if read from. So the following will work:
because the variable ‘spam’ is being looked up in the surrounding scope, the
foofunction.However, you cannot change the value of a surrounding scope. You can change global variables (if you declare them as
globalin your function), but you cannot do that for the variablespamin the above function.(Python 3 changes this, it adds a new keyword
nonlocal. If you definespamasnonlocalinside ofbaryou can assign to that variable a new value inside ofbar.)Now to your list. What happens there is that you are not altering the variable
chunksat all. Throughout your code,chunkspoints to one list, and only to that list. As far as python is concerned,chunksthe variable is not altered within theprocesschunkfunction.What does happen is that you alter the contents of the list. You can freely assign a new value to
chunks[0], because that’s not the variablechunks, it is the list referred to bychunks, first index. Python allows this because it is not a variable assignment, but a list manipulation instead.So, your ‘workaround’ is correct, if somewhat obscure. If you use Python 3, you can declare
chunksasnonlocalwithinprocesschunkand then things will work without lists too.