Can anyone recommend a strategy for aggregating raw ‘click’ and ‘impression’ data stored in a MySQL table with over 100,000,000 rows?
Here is the table structure.
CREATE TABLE `clicks` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`companyid` int(11) DEFAULT '0',
`type` varchar(32) NOT NULL DEFAULT '',
`contextid` int(11) NOT NULL DEFAULT '0',
`period` varchar(16) NOT NULL DEFAULT '',
`timestamp` int(11) NOT NULL DEFAULT '0',
`location` varchar(32) NOT NULL DEFAULT '',
`ip` varchar(32) DEFAULT NULL,
`useragent` varchar(64) DEFAULT NULL,
`processed` tinyint(1) NOT NULL DEFAULT '0',
PRIMARY KEY (`id`),
KEY `type` (`type`),
KEY `companyid` (`companyid`),
KEY `period` (`period`),
KEY `contextid` (`contextid`)
) ENGINE=MyISAM AUTO_INCREMENT=21189 DEFAULT CHARSET=latin1;
What I want to do is make this data easier to work with. I want to extract weekly and monthly aggregates from it, grouped by type, companyid and contextid.
Ideally, I’d like to take this data off the production server, aggregate it and then merge it back.
I’m really in a bit of a pickle and wondered whether anyone had any good starting points or strategies for actually aggregating the data so that it can be queried quickly using MySQL. I do not require ‘real time’ reporting for this data.
I’ve tried batch PHP scripts in the past but this seemed quite slow.
You can implement a simple PHP script with the whole monthly/weekly data aggregation logic and make it execute via cron job at a given time. Depending on the software context, it could possibly be scheduled for running at night. Additionally, you could pass a GET parameter in the request for recognizing the request source.