I have a script that reads system log files into pandas dataframes and produces charts from those. The charts are fine for small data sets. But when I face larger data sets due to larger timeframe of data gathering, the charts become too crowded to discern.
I am planning to resample the dataframe so that if the dataset passes certain size, I will resample it so there are ultimately only the SIZE_LIMIT number of rows. This means I need to filter the dataframe so every n = actual_size/SIZE_LIMIT rows would aggregated to a single row in the new dataframe. The agregation can be either average value or just the nth row taken as is.
I am not fully versed with pandas, so may have missed some obvious means.
You could use the
pandas.qcutmethod on the index to divide the index into equal quantiles. The value you pass toqcutcould beactual_size/SIZE_LIMIT.Here, grouping the index by
qcut(df.index,5)results in 5 equally binned groups. I then take the mean of each group.