I have been trying to work out a few reports based off some log

Question

0

Asked: June 5, 20262026-06-05T00:19:46+00:00 2026-06-05T00:19:46+00:00

I have been trying to work out a few reports based off some log

0

I have been trying to work out a few reports based off some log files (~50 million records and can grow ten times this going forward) – I have this loaded in a table and make the necessary changes (removing dups etc.) – The table is supposed to hold the number of requests per product per type and per day, so I am attempting to cut this down to just distinct products with a count column representing the number of requests

Here is the original table with the log data:

*************************** 1. row ***************************
       Table: cdnlog2
Create Table: CREATE TABLE `cdnlog2` (
  `serial` int(32) DEFAULT NULL,
  `ip` varchar(100) DEFAULT NULL,
  `country` varchar(100) DEFAULT NULL,
  `productid` int(11) DEFAULT NULL,
  `type` varchar(100) DEFAULT NULL,
  `query_date` date DEFAULT NULL,
  KEY `aaa` (`country`),
  KEY `ccc` (`productid`),
  KEY `type` (`type`),
  KEY `date_index` (`query_date`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1

Destination table:

*************************** 1. row ***************************
       Table: cdnlogfinal
Create Table: CREATE TABLE `cdnlogfinal` (
  `country` varchar(100) DEFAULT NULL,
  `productid` int(11) DEFAULT NULL,
  `type` varchar(100) DEFAULT NULL,
  `request_count` int(11) DEFAULT NULL,
  `query_date` date DEFAULT NULL,
  KEY `aaa` (`country`),
  KEY `ccc` (`productid`),
  KEY `type` (`type`),
  KEY `date_index` (`query_date`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1

I am attempting to now reduce the number of records to grouped values with just the distinct rows and their count (the log can contain dups since the same product can be selected multiple times on the same day), however, the insert into a secondary table has been running for several hours with the status “Copying to tmp table on disk” – I have changed the temp directory to allow for sufficient space – Any pointers?

Thanks in advance

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-05T00:19:47+00:00

Your idea is a good one, and the end result will speed up your reporting queries very much. You just need one more piece to solve the puzzle:

The problem is there are too many rows in the base table to create all the rows in the derived table in one query – the transaction takes so long, and the number of rows created is so large, it times out and/or log space for the transaction is exceeded.

Instead, you must do this one day at a time:

insert into cdnlog2 (country, productid, type, query_date)
select country, productid, type, date(transaction_time)
from cdnlog
where transaction_time between '2012-01-01 00:00:00' and '2012-01-01 23:59:59'
group by country, productid, type

Run this query separately for every day in your data range, changing the start/end timestamp accordingly.

Once your historic data is calculated, run this once per day for the previous day as part of your batch processing.

What you are doing is creating a data warehouse. Consider strongly putting this data on a separate, dedicated server. There are many advantages to doing this – read up to find out what.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have been trying to work out a few reports based off some log

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply