What is the best way to format data?
Here is the background :
I am using nameparser to parse a name the best way possible. I built a wrapper that calls the nameparser and then stores the parsed name in the database (MySQL).
What would be the most efficient approach in this case? Following is my approach.
- Step 1 : call nameparser (provide tablename, id, name, first,
middle, last, suffix). - Step 2 : Store the parsed (returned) name in
a dict of this form in memory (I am parsing relatively small names
set – say 20,000 names).{id:{'first':'John', 'middle':'V',
'last':'Doe', 'suffix':''} - Step 3 : Store the dict in the MySQL
table with one query? (Not sure if it is possible with the data
structure described in Step 2.
Here is my code :
#!/usr/bin/python
# -*- coding: utf-8 -*-
from nameparser import HumanName
import time
cursor = db.cursor()
def name(table, id, name, first, middle, last, suffix):
cursor.execute('SELECT `' + id + '`,`' + name + '` FROM `' + table
+ '` WHERE `' + name + '` IS NOT NULL AND ' + id
+ ' IS NOT NULL')
numrows = int(cursor.rowcount)
namelist = []
namelist = cursor.fetchall()
for record in namelist:
parsed = HumanName(record[1])
parsed.capitalize()
mydict[int(record[0])] = {
'first': str(parsed.first),
'middle': str(parsed.middle),
'last': str(parsed.last),
'suffix': str(parsed.suffix),
}
mydict = {}
starttime = time.time()
split = name('NamesToParse','id','name','first','middle','last','suffix')
print mydict
print time.time() - starttime
Please suggest the best way of storing the data in the MySQL table. This is what I have so far and I still have to loop through each record. I am wondering if there is a way to update the existing table rather than having to create a temp table first and then updating the original table in one go? Hope I made sense.
for id, val in mydict.items():
sorted_keys = sorted(map(str, val.keys()))
sorted_vals = map(encoding, [val[mydict] for mydict in sorted_keys]) # sorted by keys
formatted = ', '.join(["'%s'"] * len(sorted_vals))
db.execute("""insert into NamesToParseOut(%s) values (%s)""" % (', '.join(sorted_keys), formatted), sorted_vals)
Looks like I will take the List of tuples approach and insert in a temp table first followed by updating them with the original table. The time savings are amazing. I feel dictionary is an overkill for this task.