I have a script that creates a directory listing of all pdfs within a certian series of subdirectories. The outputs are tuples that include the year of the file saved as a string as well as an id for the unit that generated the report that looks something like the following:
unit1, 2010
unit2, 2002
unit2, 2005
unit2, 2010
unit3, 2003
What I’m looking to now do is create a report that finds the most recent report based on the tuple that contains the max value in its second item. Normally, I would do this in Access with a MAX query, however, I am trying to elimate that step since and write the extract all at once. Using my orginal code, my output would consist of the following:
unit1, '2010'
unit2, '2010'
unit3, '2003'
I did some looking around and realize that I need to change my script so that it would generate a list of the tuples that matched every unique id. Using the great answer I found from Split a list of tuples into sub-lists of the same tuple field I was able to get the results split into a group of sublists. This means my output is now the following:
[[(unit1, '2010')],[(unit12, '2010'), (unit2, '2010'), (unit2, '2005'), (unit2, '2002')],[(unit3, '2003']]
My difficulty now is trying to extract the tuple from each sublist that contains the highest value item. I tried the following:
import glob, os, itertools, operator
dirtup = []
for f in glob.glob('P:\Office*\Technical*\Bureau*\T*\*\YR2*\R*\*\*.pdf'):
fpath, fname = os.path.split(f)
fyr = fpath[91:95]
vcs = 'Volume'
rname, extname = os.path.splitext(fname)
rcid = fname[0:7]
dirtup.append ((f, fyr, rcid, vcs))
dirtup2 = sorted(dirtup, key=operator.itemgetter(2))
for key, group in itertools.groupby(dirtup2, operator.itemgetter(2)):
maxval = max(x[1] for x in dirtup2)
print [x for x in dirtup2 if x[1] == maxval]
This returns only the tuple that match the max of fyr rather then the max of fyr per each sublist.
Edit
Using mglison’s first answer I was able to get the output (tuple that contained second item with max value).
You can sort each sublist based on the particular field and take the first element of the sorted sublist.