Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 7725009
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 1, 20262026-06-01T04:49:54+00:00 2026-06-01T04:49:54+00:00

I have a MySQL database with a few (five to be precise) huge tables.

  • 0

I have a MySQL database with a few (five to be precise) huge tables. It is essentially a star topology based data warehouse. The table sizes range from 700GB (fact table) to 1GB and whole database goes upto 1 terabyte. Now I have been given a task of running analytics on these tables which might even include joins.
A simple analytical query on this database can be “find number of smokers per state and display it in descending order” this requirement could be converted in a simple query like

select state, count(smokingStatus) as smokers 
from abc 
having smokingstatus='current smoker' 
group by state....

This query (and many other of same nature) takes a lot of time to execute on this database, time taken is in order of tens of hours.

This database is also heavily used for insertion which means every few minutes there are thousands of rows getting added.

In such a scenario how can I tackle this querying problem?
I have looked in Cassandra which seemed easy to implement but I am not sure if it is going to be as easy for running analytical queries on the database especially when I have to use “where clause and group by construct”

Have Also looked into Hadoop but I am not sure how can I implement RDBMS type queries. I am not too sure if I want to right away invest in getting at least three machines for name-node, zookeeper and data-nodes!! Above all our company prefers windows based solutions.

I have also thought of pre-computing all the data in a simpler summary tables but that limits my ability to run different kinds of queries.

Are there any other ideas which I can implement?

EDIT

Following is the mysql environment setup

1) master-slave setup
2) master for inserts/updates
3) slave for reads and running stored procedures
4) all tables are innodb with files per table
5) indexes on string as well as int columns.

Pre-calculating values is an option but since requirements for this kind of ad-hoc aggregated values keeps changing.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-01T04:49:55+00:00Added an answer on June 1, 2026 at 4:49 am

    Looking at this from the position of attempting to make MySQL work better rather than positing an entirely new architectural system:

    Firstly, verify what’s really happening. EXPLAIN the queries which are causing issues, rather than guessing what’s going on.

    Having said that, I’m going to guess as to what’s going on since I don’t have the query plans. I’m guessing that (a) your indexes aren’t being used correctly and you’re getting a bunch of avoidable table scans, (b) your DB servers are tuned for OLTP, not analytical queries, (c) writing data while reading is causing things to slow down greatly, (d) working with strings just sucks and (e) you’ve got some inefficient queries with horrible joins (everyone has some of these).

    To improve things, I’d investigate the following (in roughly this order):

    • Check the query plans, make sure the existing indexes are being used correctly – look at the table scans, make sure the queries actually make sense.

    • Move the analytical queries off the OLTP system – the tunings required for fast inserts and short queries are very different to those for the sorts of queries which potentially read most of a large table. This might mean having another analytic-only slave, with a different config (and possibly table types – I’m not sure what the state of the art with MySQL is right now).

    • Move the strings out of the fact table – rather than having the smoking status column with string values of (say) ‘current smoker’, ‘recently quit’, ‘quit 1+ years’, ‘never smoked’, push these values out to another table, and have the integer keys in the fact table (this will help the sizes of the indexes too).

    • Stop the tables from being updated while the queries are running – if the indexes are moving while the query is running I can’t see good things happening. It’s (luckily) been a long time since I cared about MySQL replication, so I can’t remember if you can batch up the writes to the analytical query slave without too much drama.

    • If you get to this point without solving the performance issues, then it’s time to think about moving off MySQL. I’d look at Infobright first – it’s open source/$$ & based on MySQL, so it’s probably the easiest to put into your existing system (make sure the data is going to the InfoBright DB, then point your analytical queries to the Infobright server, keep the rest of the system as it is, job done), or if Vertica ever releases its Community Edition. Hadoop+Hive has a lot of moving parts – its pretty cool (and great on the resume), but if it’s only going to be used for the analytic portion of you system it may take more care & feeding than other options.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have a MySQL database that has a few very simple tables. I would
I currently use enums as TINYINTs in MySQL database. My tables have a few
I have a MySQL 5.0 database with a few tables containing over 50M rows.
I have a legacy mysql database and there's this table which has a few
I have a few tables in a MySQL database similar to this setup: major
I currently have a few tables in my MySQL database where I declare the
I have a small mySQL database and a few simple php based webpages that
I have mysql database structure like below: CREATE TABLE test ( id int(11) NOT
I have a MySQL database, and a particular table in that database will need
I have a MySQL database with multiple tables, each of which have the same

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.