I had a similar question some days ago that was solved, but now, some of my files have a very similar file, but where the header has a space before the name, or a “.” in the end and it just doesn’t work.
So, I have this data1:
Year,Day,Hour,Min,Sec.,P1S1
2003, 1, 0, 1,30.09, 0.295E+04
2003, 1, 1, 0,11.84, 0.297E+04
2003, 1, 2, 0, 8.26, 0.338E+04
2003, 1, 3, 0, 4.69, 0.291E+04
2003, 1, 4, 0, 1.11, 0.337E+04
And I can read it with (notice the need of a space before Year in ‘ Year’, that is needed to read the file!):
import pandas as pd
def parse(yr, doy, hr, min, sec):
yr, doy, hr, min = [int(x) for x in [yr, doy, hr, min]]
sec = float(sec)
mu_sec = int((sec - int(sec)) * 1e6)
sec = int(sec)
dt = datetime(yr - 1, 12, 31)
delta = timedelta(days=doy, hours=hr, minutes=min, seconds=sec, microseconds=mu_sec)
return dt + delta
# notice the need of a space before Year in ' Year', that is needed to read the file!
pd.read_csv(data1, parse_dates=[[' Year','Day','Hour','Min','Sec.']], date_parser=parse, index_col=0)
Now, if I try the same with, data2 (notice that now there is a ‘.’ after Min that didn’t exist in data1):
Year,Day,Hour,Min.,Sec.,P1S1
2003, 1, 0, 0, 0.00, 0.261E+04
2003, 1, 0, 5, 0.00, 0.281E+04
2003, 1, 0,10, 0.00, 0.268E+04
2003, 1, 0,15, 0.00, 0.305E+04
When I do:
pd.read_csv(data2, parse_dates=[[' Year','Day','Hour','Min','Sec.']], date_parser=parse, index_col=0)
I get an error because Python/Pandas is not expecting that ‘.’ after ‘Min’, or the same when I have a file without the space before ‘Year’. Or any other slight difference in those first 5 header field names.
So, my question is, is there any way to make this more robust? I know the first 5 fields are always in this format, it’s just their name in the header that changes.
If you know they’re always in the same positions you can say something like
parse_dates=[[0,1,2,3,4]].