I have a table for news articles, containing amongst others the author, the time posted and the word count for each article. The table is rather large, containing more than one million entries and growing with an amount of 10.000 entries each day.
Based on this data, a statistical analysis is done, to determine the total number of words a specific author has published in a specific time-window (i.e. one for each hour of each day, one for each day, one for each month) combined with an average for a time-span. Here are two examples:
- Author A published 3298 words on 2011-11-04 and 943.2 words on average for each day two month prior (from 2011-09-04 to 2011-11-03)
- Author B published 435 words on 2012-01-21 between 1pm and 2pm and an average of 163.94 words each day between 1pm and 2pm in the 30 days before
Current practice is to start a script at the end of each defined time-window via cron-job, which calculates the count and the averages and stores it in a separate table for each time-window (i.e. one for each hourly window, one for each daily, one for each monthly etc…).
The calculation of sums and averages can easily be done in SQL, so I think Views might be a more elegant solution to this, but I don’t know about the implications on performance.
Are Views an appropriate solution to the problem described above?
I think you can use materialize views for it. It’s not really implemented in MySQL, but you can implement it with tables. Look at