I have a Excel .CSV file I’m attempting to read in with DictReader.
All seems to be well, except it seems to omit rows, specifically those with missing columns.
Our input looks like:
mail,givenName,sn,lorem,ipsum,dolor,telephoneNumber
ian.bay@blah.com,ian,bay,3424,8403,2535,+65(2)34523534545
mike.gibson@blah.com,mike,gibson,3424,8403,2535,+65(2)34523534545
ross.martin@blah.com,ross,martin,,,,+65(2)34523534545
david.connor@blah.com,david,connor,,,,+65(2)34523534545
chris.call@blah.com,chris,call,3424,8403,2535,+65(2)34523534545
So some of the rows have missing lorem/ipsum/dolor columns, and it’s just a string of commas for those.
We’re reading it in with:
def read_gd_dump(input_file="blah 20100423.csv"):
gd_extract = csv.DictReader(open('blah 20100423.csv'), restval='missing', dialect='excel')
return dict([(row['something'], row) for row in gd_extract])
And I checked that “something” (the key for our dict) isn’t one of the missing columns, I had originally suspected it might be that. It’s one of the columns after that.
However, DictReader seems to completely skip over the rows. I tried setting restval to something, didn’t seem to make any difference. I can’t seem to find anything in Python’s CSV docs (http://docs.python.org/library/csv.html) that would explain this behaviour, but I may have misread something.
Can’t reproduce your problem — when I save that data and then assign
list(gd_extract), I see:five dicts, including those with missing
ipsumetc. I fear that in your laudable attempt at simplifying the problem you’ve simplified it excessively, so that your bug has gone away.If you have duplicates in column
something(can’t check, since you don’t have that column in your sample data) that would of course explain the “apparently missing” rows — they’re not missing from the csv reader’s returned stream, they get “overwritten” in the dict you’re returning. Could that be the issue?