(Python 2.7, Pandas 0.9)
This seems like a simple thing to do, but I can’t figure out how to calculate the difference between two date columns in a dataframe using Pandas. This dataframe already has an index, so making either column into a DateTimeIndex is not desirable.
To convert each date column from strings I used:
data.Date_Column = pd.to_datetime(data.Date_Column)
From there, to get elapsed time between 2 columns, I do:
data.Closed_Date - data.Created_Date
which returns an error:
TypeError: %d format: a number is required, not a numpy.timedelta64
Checking dtypes on both columns yields datetime64[ns] and the individual dates in the array are type timestamp.
What am I missing?
EDIT:
Here’s an example where I can create separate DateTimeIndex objects and accomplish what I want, but when I try to do it in the context of a dataframe, it fails.
Created_Date = pd.DatetimeIndex(data['Created_Date'], copy=True)
Closed_Date = pd.DatetimeIndex(data['Closed_Date'], copy=True)
Closed_Date.day - Created_Date.day
[Out] array([ -3, -16, 5, ..., 0, 0, 0])
Now the same but in a dataframe:
data.Created_Date = pd.DatetimeIndex(data['Created_Date'], copy=True)
data.Closed_Date = pd.DatetimeIndex(data.Closed_Date, copy=True)
data.Created_Date.day - data.Created_Date.day
AttributeError: 'Series' object has no attribute 'day'
Here’s some of the data if you want to play around with it:
data['Created Date'][0:10].to_dict()
{0: '1/1/2009 0:00',
1: '1/1/2009 0:00',
2: '1/1/2009 0:00',
3: '1/1/2009 0:00',
4: '1/1/2009 0:00',
5: '1/1/2009 0:00',
6: '1/1/2009 0:00',
7: '1/1/2009 0:00',
8: '1/1/2009 0:00',
9: '1/1/2009 0:00'}
data['Closed Date'][0:10].to_dict()
{0: '1/7/2009 0:00',
1: nan,
2: '1/1/2009 0:00',
3: '1/1/2009 0:00',
4: '1/1/2009 0:00',
5: '1/12/2009 0:00',
6: '1/12/2009 0:00',
7: '1/7/2009 0:00',
8: '1/10/2009 0:00',
9: '1/7/2009 0:00'}
Update: A useful workaround is to just smash this with the DatetimeIndex constructor (which is usually much faster than an apply), for example:
In 0.15 this will be vailable in the dt attribute (along with other datetime methods):
Your error was the syntax, which although one might hope it would work, it doesn’t:
With more complicated selections like this one you can use
apply:Be wary, you’ll probably ought to do something separately with these
NaTs:.
Note: there is similarly strange behaviour using
.dayson a timedelta withNaT: