Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8029073
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 5, 20262026-06-05T00:19:46+00:00 2026-06-05T00:19:46+00:00

I have been trying to work out a few reports based off some log

  • 0

I have been trying to work out a few reports based off some log files (~50 million records and can grow ten times this going forward) – I have this loaded in a table and make the necessary changes (removing dups etc.) – The table is supposed to hold the number of requests per product per type and per day, so I am attempting to cut this down to just distinct products with a count column representing the number of requests

Here is the original table with the log data:

*************************** 1. row ***************************
       Table: cdnlog2
Create Table: CREATE TABLE `cdnlog2` (
  `serial` int(32) DEFAULT NULL,
  `ip` varchar(100) DEFAULT NULL,
  `country` varchar(100) DEFAULT NULL,
  `productid` int(11) DEFAULT NULL,
  `type` varchar(100) DEFAULT NULL,
  `query_date` date DEFAULT NULL,
  KEY `aaa` (`country`),
  KEY `ccc` (`productid`),
  KEY `type` (`type`),
  KEY `date_index` (`query_date`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1

Destination table:

*************************** 1. row ***************************
       Table: cdnlogfinal
Create Table: CREATE TABLE `cdnlogfinal` (
  `country` varchar(100) DEFAULT NULL,
  `productid` int(11) DEFAULT NULL,
  `type` varchar(100) DEFAULT NULL,
  `request_count` int(11) DEFAULT NULL,
  `query_date` date DEFAULT NULL,
  KEY `aaa` (`country`),
  KEY `ccc` (`productid`),
  KEY `type` (`type`),
  KEY `date_index` (`query_date`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1

I am attempting to now reduce the number of records to grouped values with just the distinct rows and their count (the log can contain dups since the same product can be selected multiple times on the same day), however, the insert into a secondary table has been running for several hours with the status “Copying to tmp table on disk” – I have changed the temp directory to allow for sufficient space – Any pointers?

Thanks in advance

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-05T00:19:47+00:00Added an answer on June 5, 2026 at 12:19 am

    Your idea is a good one, and the end result will speed up your reporting queries very much. You just need one more piece to solve the puzzle:

    The problem is there are too many rows in the base table to create all the rows in the derived table in one query – the transaction takes so long, and the number of rows created is so large, it times out and/or log space for the transaction is exceeded.

    Instead, you must do this one day at a time:

    insert into cdnlog2 (country, productid, type, query_date)
    select country, productid, type, date(transaction_time)
    from cdnlog
    where transaction_time between '2012-01-01 00:00:00' and '2012-01-01 23:59:59'
    group by country, productid, type
    

    Run this query separately for every day in your data range, changing the start/end timestamp accordingly.

    Once your historic data is calculated, run this once per day for the previous day as part of your batch processing.

    What you are doing is creating a data warehouse. Consider strongly putting this data on a separate, dedicated server. There are many advantages to doing this – read up to find out what.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have been trying to work out the bug on this but can't seem
Okay I been trying to work this out but unable too. I have a
I have been pulling my hair out trying to make this work. I have
A few of us at work have been reading some on Haskell and we
I have been trying to figure this out for a few days now and
For the past few days, I have been trying to figure out how to
For the last few days, I have been trying to use Python for some
Hey, have been trying to work this out for last day or so but
I have been looking around for a few hours trying to figure out how
I'm new to Django (and Python) and I have been trying to work out

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.