I’m building a website with user generated content. On the home page I want to show a list of all created items, and I want to be able to sort them by a view counter. That’s sound easy, but I want multiple counters. I want to know which was the most visited item in the last day, last week or last months or overall.
My first Idea was to create 4 counter columns in the item’s DB-Table. One for each of daily, weekly, monthly and overall, and the create a cron job, that clears the daily counter every 24 hours, the weekly counter every 7 days and so on.
But my problem with this is, what happens if I want to know which was the most viewed item of the week, just after the weekly counter got cleared?
What I need is an efficient way to create a continous counter, which got reduced for every page view that is too old, and increased for every new page view.
Right now I’m thinking of a solution with the redis server, but I don’t have any solution yet.
I’m just looking for a general idea here, but FYI I’m developing this application in Ruby on Rails.
What I would suggest to tracking hits on each page, with just a timestamp and any user id or what ever you can store, then you can calculate the counters how you like and change it later because you have the data in a simple to use format. A table with entity(page) userid and timestamp should be good. Just add to that when ever the page is requested.
To reduce the number of inserts you can batch them together in your software. Constructing a multi row insert like here for MySQL will save overhead. You will just need your classes to construct the insert as described and store up until the insert. One idea is to not only time it but fix a number of rows to batch allowing you to say at most if the server goes does you only lose x number of rows of hits.
There is a MySQL trigger that you can cause to run only after the full batch of inserts is done which you could use to update reporting tables so you don’t have to constantly hit the main hit tracking table.
Also if this does need to be really high throughput it could be broken off into its own shard and accessed via Ajax calls to do the hit tracking and get the counts.