I’m currently creating a raspberry pi based logging device for logging the power which is fed into the grid by a solar array.
The “main table” will be growing at ~ 20 entries representing the “current” power produced by several parts of the array.
Basically this isn’t that much and can be handled at an acceptable performance using a raspberry pi, but with a growing amount of data queries like “select last 10 years, group by month” probably wouldn’t be very effective… (the data should be displayed via an interactive web interface)
I thought of doing some “background aggregation” and maintaining several tables for containing the aggregated data of various timeframes, but this seems like a problem which probably has been dealt with by many people before.
What do you suggest me to do?
You do not know how much data growth is needed to affect performance.
You do not know by how much performance will be affected then.
You do not know if performance will be affected at all.
As long as you do not have even an estimate of how much performance improvement you need, it does not make sense to try to do optimizations.
Or, as said by Donald Knuth:
If you really do want to create caches of aggregated values, I’d suggest to use triggers to keep the cache consistent after any change to the original data.