I am new to Python and I have been cleaning up a messy database using a combination of Google refine http://code.google.com/p/google-refine/ and Excel, however, I think that Python can do a better job as long as I am able to get some ‘recipes’ that I can reuse.
One variation of my problem is inconsistency in the ‘Location’ field of the database. About 95% of the data has the format in the list Location1, which I have been able to process with python in a more efficient way than with the use of Excel filters. However, I am looking for a python library or recipe that would allow me to work with all types of geo-locations in the database, maybe by defining patterns within the list.
Thanks in advance for your help!
Location1=['Washington, DC','Miami, FL','New York, NY']
Location2=['Kaslo/Nelson area (Canada), BC','Plymouth (UK/England)', 'Mexico, DF - outskirts-, (Mexico),']
Location3=['38.206471, -111.165271']
# This works for about 95% of the data, basically US addresses on Location1 type of List
CityList=[loc.split(',',1)[0] for loc in Location1]
StateList=[loc.split(',',1)[1] for loc in Location1]
Not sure if you’re still having problems with this but here’s an answer that I believe would work for you:
Output: