I’m looking for a statistics package for Perl (CPAN is fine) that allows me to add data incrementally instead of having to pass in an entire array of data.
Just the mean, median, stddev, max, and min is necessary, nothing too complicated.
The reason for this is because my dataset is entirely too large to fit into memory. The data source is in a MySQL database, so right now I’m just querying a subset of the data and computing the statistics for them, then combining all the manageable subsets later.
If you have other ideas on how to overcome this issue, I’d be much obliged!
Statistics::Descriptive::Discrete allows you to do this in a manner similar to Statistics::Descriptive, but has been optimized for use with large data sets. (The documentation reports an improvement by two orders of magnitude (100x) in memory usage, for example).