I need an advice on optimal approach to store statistical data. There is a

Question

0

Asked: June 13, 20262026-06-13T21:59:30+00:00 2026-06-13T21:59:30+00:00

I need an advice on optimal approach to store statistical data. There is a

0

I need an advice on optimal approach to store statistical data. There is a project on Django, which has a database (mysql) of 30 000 online games.

Each game has three statistical parameters:

number of views,
number of plays,
number of likes

Now I need to store historical data for these three parameters on a daily basis, so I was thinking on creating a single database which will has five columns:

gameid, number of views, plays, likes, date (day-month-year data).

So in the end, every day for every game will be logged in one row, so in one day this table will have 30000 rows, in 10 days it will have size of 300000 and in a year it will have size of 10 950 000 rows. I’m not a big specialist in DBA stuff, but this says me, that this quickly will become a performance problem. I’m not talking what will happen in 5 years time.
The data collected in this table is needed for simple graphs

(daily, weekly, monthly, custom range).

Maybe you have better ideas on how to store this data? Maybe noSQL will be more suitable in this case? Really need your advice on this.d

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-13T21:59:31+00:00

Partitioning in postgresql works great for big logs. First create the parent table:

create table  game_history_log (
    gameid integer,
    views integer,
    plays integer,
    likes integer,
    log_date date
);

Now create the partitions. In this case one for each month, 900 k rows, would be good:

create table game_history_log_201210 (
    check (log_date between '2012-10-01' and '2012-10-31')
) inherits (game_history_log);

create table game_history_log_201211 (
    check (log_date between '2012-11-01' and '2012-11-30')
) inherits (game_history_log);

Notice the check constraints in each partition. If you try to insert in the wrong partition:

insert into game_history_log_201210 (
    gameid, views, plays, likes, log_date
) values (1, 2, 3, 4, '2012-09-30');
ERROR:  new row for relation "game_history_log_201210" violates check constraint "game_history_log_201210_log_date_check"
DETAIL:  Failing row contains (1, 2, 3, 4, 2012-09-30).

One of the advantages of partitioning is that it will only search in the correct partition reducing drastically and consistently the search size regardless of how many years of data there is. Here the explain for the search for a certain date:

explain
select *
from game_history_log
where log_date = date '2012-10-02';
                                              QUERY PLAN                                              
------------------------------------------------------------------------------------------------------
 Result  (cost=0.00..30.38 rows=9 width=20)
   ->  Append  (cost=0.00..30.38 rows=9 width=20)
         ->  Seq Scan on game_history_log  (cost=0.00..0.00 rows=1 width=20)
               Filter: (log_date = '2012-10-02'::date)
         ->  Seq Scan on game_history_log_201210 game_history_log  (cost=0.00..30.38 rows=8 width=20)
               Filter: (log_date = '2012-10-02'::date)

Notice that apart from the parent table it only scanned the correct partition. Obviously you can have indexes on the partitions to avoid a sequential scan.

Inheritance Partitioning

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I need an advice on optimal approach to store statistical data. There is a

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply