Ok, I have searched for this specific answer after trying to teach myself what the best way to do this was. I’m new to python, and was hoping somebody knows a quick way to help me out! Here is the example of the input data file:
Lat,Long,Var,Id,Date Time
47.022,-104.330,10,MBVR,12/12/20 06:36:00
47.022,-104.330,11,MBVR,12/12/20 06:26:00
48.810,-104.253,10,MCOM,12/12/20 06:41:00
48.810,-104.253,13,MCOM,12/12/20 06:38:00
48.810,-104.253,12,MCOM,12/12/20 06:48:00
47.022,-104.330,11,MBVR,12/12/20 05:17:00
47.022,-104.330,10,MBVR,12/12/20 05:34:00
47.022,-104.330,12,MBVR,12/12/20 05:24:00
The file can have many different id’s, and that is just a sample. I have the program down on ingesting the data and separating it out, and writing an output file. Here is a part of my code:
csv_max = 'X:\\csv\\lsrwnd.dat'
my_file = open(csv_max, "rb")
rowadd = my_file.next()
for line in my_file:
items = line.split(",")
coords = items[0:2]
wind = items[2]
station = items[3]
timestamp = items[4]
So here’s my problem from this point. What I need to do is create an output file of just the maximum values of the “id” column. So, if “station” is repeated, I need the program to run through each occurrence of that station, find the max “id”, and ONLY return the line of the max value of “id” for each respective “station”. From the data example above, all I want is:
48.810,-104.253,13,MCOM,12/12/20 06:38:00
47.022,-104.330,12,MBVR,12/12/20 05:24:00
and the rest can be dumped. For MCOM, the max id is 13, and for MBVR the max is 12. So if theres 50 different stations, I need only 50 lines returned, which each of the returned 50 would be the max “id” of that station. I can create the output file, but how can I get the max id data for each station, and also the other elements? I tried using dictionaries, but I’m having trouble figuring out how to return the whole line instead of just the maximum value key and station name. The “date time” variable isnt important (in terms of needing the most recent or first occurrence). Thank you in advance for the help!
The output.csv file looks now like this: