I need an advice on optimal approach to store statistical data. There is a project on Django, which has a database (mysql) of 30 000 online games.
Each game has three statistical parameters:
- number of views,
- number of plays,
- number of likes
Now I need to store historical data for these three parameters on a daily basis, so I was thinking on creating a single database which will has five columns:
gameid, number of views, plays, likes, date (day-month-year data).
So in the end, every day for every game will be logged in one row, so in one day this table will have 30000 rows, in 10 days it will have size of 300000 and in a year it will have size of 10 950 000 rows. I’m not a big specialist in DBA stuff, but this says me, that this quickly will become a performance problem. I’m not talking what will happen in 5 years time.
The data collected in this table is needed for simple graphs
(daily, weekly, monthly, custom range).
Maybe you have better ideas on how to store this data? Maybe noSQL will be more suitable in this case? Really need your advice on this.d
Partitioning in postgresql works great for big logs. First create the parent table:
Now create the partitions. In this case one for each month, 900 k rows, would be good:
Notice the check constraints in each partition. If you try to insert in the wrong partition:
One of the advantages of partitioning is that it will only search in the correct partition reducing drastically and consistently the search size regardless of how many years of data there is. Here the explain for the search for a certain date:
Notice that apart from the parent table it only scanned the correct partition. Obviously you can have indexes on the partitions to avoid a sequential scan.
Inheritance Partitioning