I realise the info to answer this question is probably already on here, but as a python newby I’ve been trying to piece together the info for a few weeks now and I’m hitting some trouble.
this question Python "join" function like unix "join" answers how to do a join on two lists easily, but the problem is that dictreader objects are iterables and not straightforward lists, meaning that there’s an added layer of complications.
I basically am looking for an inner join on two CSV files, using the dictreader object. Here’s the code I have so far:
def test(dictreader1, dictreader2):
matchedlist = []
for dictline1 in dictreader1:
for dictline2 in dictreader2:
if dictline1['member']=dictline2['member']:
matchedlist.append(dictline1, dictline2)
else: continue
return matchedlist
This is giving me an error at the if statement, but more importantly, I don’t seem to be able to access the [‘member’] element of the dictionary within the iterable, as it says it has no attribute “getitem“.
Does anyone have any thoughts on how to do this? For reference, I need to keep the lists as iterables because each file is too big to fit in memory. The plan is to control this entire function within another for loop that only feeds it a few lines at a time to iterate over. So it will read one line of the left hand file, iterate over the whole second file to find a member field that matches and then join the two lines, similar to an SQL join statement.
Thanks for any help in advance, please forgive any obvious errors on my part.
A few thoughts:
Replace the
=with==. The latter is used for equality tests; the former for assignments.Add a line a the beginning,
dictreader2 = list(dictreader2). That will make it possible to loop over the dictionary entries more than once.Add a second pair of parenthese to
matchedlist.append((dictline1, dictline2)). The list.append method takes just one argument, so you want to create a tuple out of dictline1 and dictline2.The final
else: continueis unnecessary. A for-loop will automatically loop for you.Use a print statement or somesuch to verify that dictline1 and dictline2 are both dictionary objects that have member as a key. It could be that your function is correct, but is being called with something other than a dictreader object.
Here is a worked out example using a list of dicts as input (similar to what a DictReader would return):
A further suggestion is to combine the two dictionaries into a single entry (this is closer to what an SQL inner join would do):
Good luck with your project 🙂