I am doing a bulk import of dbf files to sqlite. I wrote a simple script in python using the dbf module at http://dbfpy.sourceforge.net/. It works fine and as expected except for a small few cases. In a very discreet numbr of cases the module seems to have added a few erroneous records to the table it was reading.
I know this sounds crazy right but it really seems to be the case. I have exported the dbase file in question to csv using open office and imported it directly to sqlite using .import and the 3 extra records are not there.
But if I iterate through the file using python and the dbfpy module the 3 extra records are added.
I am wondering is it possible that these three records were flagged as deleted in the dbf file and while invisible to open office are being picked up by the dbf module. I could be way off in this possibility but I am really scratching my head on this one.
Any help is appreciated.
What follows is a sample of my method for reading the dbf file. I have removed the loop and used one single case instead.
conn = lite.connect('../data/my_dbf.db3')
#used to get rid of the 8 byte string error from sqlite3
conn.text_factory = str
cur = conn.cursor()
rows_list = []
db = dbf.Dbf("../data/test.dbf")
for rec in db:
***if not rec.deleted:***
row_tuple = (rec["name"], rec["address"], rec["age"])
rows_list.append(row_tuple)
print file_name + " processed"
db.close()
cur.executemany("INSERT INTO exported_data VALUES(?, ?, ?)", rows_list)
#pprint.pprint(rows_list)
conn.commit()
Solution
Ok after about another half hour of testing before lunch I discovered that my possible hypothesis was in fact correct some files had not been packed and as such had records which had been flagged for deleted still remaining in them. They should not have been in an unpacked state after export so this caused more confusion.
I manually packed one file and tested it and it immediately returned the proper results.
A big thanks for the help on this. I had added in the solution given below to ignore the deleted records. I had searched and searched for this method(deleted) in this module but could not find an api doc for it, I even looked in the code but in the fog of it all it must have slipped by. Thanks a million for the solution and help guys.
If you wont to discard records marked as deleted, you can write: