Two-headed question here guys,
First, I’ve been trying to do some searching for a way to read .xlsx files in python. Does xlrd read .xlsx files now? If not, what’s the recommended way to read/write to such a file?
Second, I have two files with similar information. One primary field with scoping subfields (like coordinates(the primary field) -> city -> state -> country). In the older file, the information is given an ID number while the newer file (with records deleted/added) does not have these ID’s. In python, I’d 1) open the two files 2) check the primary field of the older file against the primary field of the newer file and merge their information to a new file if they match. Given that its not too big of a file, I don’t mind the O(n^2) complexity. My question is this: is there a well-defined way to do this in VBA or excel? Everything I think of using excel’s library seems too slow and I’m not excellent with VBA.
I frequently access excel files through python and xlrd, python and the Excel COM object. For this job, xlrd won’t work because it does not support the xlsx format. But no matter, both approaches are overkill for what you are looking for. Simple Excel formulas will deliver what you want, specifically VLOOKUP.
VLOOKUP “looks for a value in the lefmost column of a table, and then returns a value in the same row from the column you specify”.
Some advice on VLOOKUP, First, if you want to match on multiple cells, create a “key” cell which concatenates the cells you are interested in (in both workbooks). Second, make sure to set the last argument to VLOOKUP as FALSE because you will only want exact matches.
Regarding performance, excel formulas are often very fast.
Read the help file on VLOOKUP and ask further questions here.
Late edit (from Mark Baker’s answer): There is now a python solution for xlsx. Openpyxl was created this year by Eric Gazoni to read and write Excel’s xlsx format.