Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 7936999
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 3, 20262026-06-03T22:23:45+00:00 2026-06-03T22:23:45+00:00

I have an analytics table (5M rows and growing) with the following structure Hits

  • 0

I have an analytics table (5M rows and growing) with the following structure

Hits 
  id int() NOT NULL AUTO_INCREMENT,
  hit_date datetime NOT NULL,
  hit_day int(11) DEFAULT NULL,
  gender varchar(255) DEFAULT NULL,
  age_range_id int(11) DEFAULT NULL,
  klout_range_id int(11) DEFAULT NULL,
  frequency int(11) DEFAULT NULL,
  count int(11) DEFAULT NULL,
  location_id int(11) DEFAULT NULL,
  source_id int(11) DEFAULT NULL,
  target_id int(11) DEFAULT NULL,

Most queries to the table is to query between two datetimes for a particular sub-set of columns and them sum up all the count column across all rows. For example:

SELECT target.id,
   SUM(CASE gender WHEN 'm' THEN count END) AS 'gender_male',
   SUM(CASE gender WHEN 'f' THEN count END) AS 'gender_female',
   SUM(CASE age_range_id WHEN 1 THEN count END) AS 'age_18 - 20',
   SUM(CASE target_id WHEN 1 then count END) AS 'target_test'
   SUM(CASE location_id WHEN 1 then count END) AS 'location_NY'
FROM Hits
WHERE (location_id =1 or location_id = 2)
  AND (target_id = 40 OR target_id = 22)
  AND cast(hit_date AS date) BETWEEN '2012-5-4'AND '2012-5-10'
GROUP BY target.id

The interesting thing about queries to this table is that the where clause include any permutation of Hit columns names and values since those are what we’re filtering against. So the particular query above is getting the # of males and females between the ages of 18 and 20 (age_range_id 1) in NY that belongs to a target called “test”. However, there are over 8 age groups, 10 klout ranges, 45 locations, 10 sources etc (all
foreign key references).

I currently have an index on hot_date and another one on target_id. What the best way to properly index this table?. Having a composite index on all column fields seems inherently wrong.

Is there any other way to run this query without using a sub-query to sum up all counts? I did some research and this seems to be the best way to get the data-set I need but is there a more efficient way of handling this query?

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-03T22:23:47+00:00Added an answer on June 3, 2026 at 10:23 pm

    Here’s your optimized query. The idea is to get rid of the ORs and the CAST() function on hit_date so that MySQL can utilize a compound index that covers each of the subsets of data. You’ll want a compound index on (location_id, target_id, hit_date) in that order.

    SELECT id, gender_male, gender_female, `age_18 - 20`, target_test, location_NY
    FROM
    (
    SELECT target.id,
       SUM(CASE gender WHEN 'm' THEN 1 END) AS gender_male,
       SUM(CASE gender WHEN 'f' THEN 1 END) AS gender_female,
       SUM(CASE age_range_id WHEN 1 THEN 1 END) AS `age_18 - 20`,
       SUM(CASE target_id WHEN 1 then 1 END) AS target_test,
       SUM(CASE location_id WHEN 1 then 1 END) AS location_NY
    FROM Hits
    WHERE (location_id =1)
      AND (target_id = 40)
      AND hit_date BETWEEN '2012-05-04 00:00:00' AND '2012-05-10 23:59:59'
    GROUP BY target.id
    
    UNION ALL
    
    SELECT target.id,
       SUM(CASE gender WHEN 'm' THEN 1 END) AS gender_male,
       SUM(CASE gender WHEN 'f' THEN 1 END) AS gender_female,
       SUM(CASE age_range_id WHEN 1 THEN 1 END) AS `age_18 - 20`,
       SUM(CASE target_id WHEN 1 then 1 END) AS target_test,
       SUM(CASE location_id WHEN 1 then 1 END) AS location_NY
    FROM Hits
    WHERE (location_id = 2)
      AND (target_id = 22)
      AND hit_date BETWEEN '2012-05-04 00:00:00' AND '2012-05-10 23:59:59'
    GROUP BY target.id
    
    UNION ALL
    
    SELECT target.id,
       SUM(CASE gender WHEN 'm' THEN 1 END) AS gender_male,
       SUM(CASE gender WHEN 'f' THEN 1 END) AS gender_female,
       SUM(CASE age_range_id WHEN 1 THEN 1 END) AS `age_18 - 20`,
       SUM(CASE target_id WHEN 1 then 1 END) AS target_test,
       SUM(CASE location_id WHEN 1 then 1 END) AS location_NY
    FROM Hits
    WHERE (location_id =1)
      AND (target_id = 22)
      AND hit_date BETWEEN '2012-05-04 00:00:00' AND '2012-05-10 23:59:59'
    GROUP BY target.id
    
    UNION ALL
    
    SELECT target.id,
       SUM(CASE gender WHEN 'm' THEN 1 END) AS gender_male,
       SUM(CASE gender WHEN 'f' THEN 1 END) AS gender_female,
       SUM(CASE age_range_id WHEN 1 THEN 1 END) AS `age_18 - 20`,
       SUM(CASE target_id WHEN 1 then 1 END) AS target_test,
       SUM(CASE location_id WHEN 1 then 1 END) AS location_NY
    FROM Hits
    WHERE (location_id = 2)
      AND (target_id = 22)
      AND hit_date BETWEEN '2012-05-04 00:00:00' AND '2012-05-10 23:59:59'
    GROUP BY target.id
    ) a
    GROUP BY id
    

    If your selection size is so large that this is no improvement, then you may as well keep scanning all rows like you’re already doing.

    Note, surround aliases with back ticks, not single quotes, which are deprecated. I also fixed your CASE clauses which had count instead of 1.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have a table with several million rows. Each row represents a user session
Lets say I have a table containing several hundred million rows that looks something
We have a Web Analytics database with 2 tables Table 1 and Table 2
I have a table feeds that contains a collection of code-description rows. e.g. Key
I have a table of customers. This table can be big, millions of rows.
I have a table with about 1.4 billion records, with the following format: mysql>
There is a table with visits data: uid (INT) | created_at (DATETIME) I want
I have google analytics on my site. One page has a button which when
I have google analytics set on a site I'm working on, It has a
Does anybody know what google web master tools offer I have google analytics on

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.