I’ve got 7000 data frames with columns
Date, X_1
Date, X_2
...
Each dataframe has around 2500 rows.
The dates sometimes overlap, but are not guaranteed to do so.
I’d like to combine them into a dataframe of the form
Date X_1 X_2 etc.
I tried applying combine_first 7000 times, but it was really slow, as it had to create 7000 new objects, each slightly bigger than the last one.
Is there a more efficient way to combine multiple dataframes?
Assuming that Date is the index rather than a column then you can do an “outer”
join:Note: it may be more efficient to pass in a generator of DataFrames rather than a list.
For example:
.
If
'Date'is a column you can useset_indexfirst: