We’ve just built a system that rolls up its data at midnight. It must

Question

0

Asked: May 26, 20262026-05-26T01:56:04+00:00 2026-05-26T01:56:04+00:00

We’ve just built a system that rolls up its data at midnight. It must

0

We’ve just built a system that rolls up its data at midnight. It must iterate through several combinations of tables in order to rollup the data it needs. Unfortunately the UPDATE queries are taking forever. We have 1/1000th of our forecasted userbase and it already takes 28 minutes to rollup our data daily with just our beta users.

Since the main lag is UPDATE queries, it may be hard to delegate servers to handle the data processing. What are some other options for optimizing millions of UPDATE queries? Is my scaling issue in the code below?:

        $sql = "SELECT ab_id, persistence, count(*) as no_x FROM $query_table ftbl
                WHERE ftbl.$query_col > '$date_before' AND ftbl.$query_col <= '$date_end'
                GROUP BY ab_id, persistence";

        $data_list = DatabaseManager::getResults($sql);

        if (isset($data_list)){
            foreach($data_list as $data){

                $ab_id = $data['ab_id'];
                $no_x = $data['no_x'];
                $measure = $data['persistence'];

                $sql = "SELECT ab_id FROM $rollup_table WHERE ab_id = $ab_id AND rollup_key = '$measure' AND rollup_date = '$day_date'";
                if (DatabaseManager::getVar($sql)){
                        $sql = "UPDATE $rollup_table SET $rollup_col = $no_x WHERE ab_id = $ab_id AND rollup_key = '$measure' AND rollup_date = '$day_date'";
                                DatabaseManager::update($sql);
                } else {
                        $sql = "INSERT INTO $rollup_table (ab_id, rollup_key, $rollup_col, rollup_date) VALUES ($ab_id, '$measure', $no_x, '$day_date')";
                                DatabaseManager::insert($sql);
                }
            }
        }

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-26T01:56:05+00:00

When addressing SQL scaling issues, it is always best to benchmark your problematic SQL. Even at the PHP level is fine in this case, as you’re running your queries within PHP.

If your first query could potentially return millions of records, you may be better served running that query as a MySQL stored procedure. That will minimize the amount of data that has to be transferred between database server and PHP application server. Even if both are the same machine, you can still realize a significant performance improvement.

Some questions to consider that may help to resolve your issue follow:

How long do your SELECT queries take to process without the UPDATE or INSERT statements?
What is the percentage breakdown of your queries – by both SQL selects, and the INSERT and UPDATE? It will be easier to help identify solutions with that info.
Is it possible that there may be larger bottlenecks with those that may resolve your performance issues?
Is it necessary to iterate through your data at the PHP source-code level rather than the MySQL stored procedure level?
Is there a necessity to iterate procedurally through your records, or is it possible to accomplish the same thing through set-based operations?
Does your rollup_table have an index that covers the columns from the UPDATE query?
Also, the SELECT query ran right before your UPDATE query appears to have an identical WHERE clause. It seems to be a redundancy. If you can get away with only running the WHERE clause once, you will shave a lot of time off your largest bottleneck.

If you’re unfamiliar with writing MySQL stored procedures, the process is quite simple. See http://www.mysqltutorial.org/getting-started-with-mysql-stored-procedures.aspx for an example. MySQL has good documentation on this as well. A stored procedure is a program that runs within the MySQL database process, which may help to improve performance when dealing with queries that potentially return millions of rows.

Set-based database operations are often faster than procedural operations. SQL is a set-based language. You can update all rows in a database table with a single UPDATE statement, i.e. UPDATE customers SET total_owing_to_us = 1000000 updates all rows in the customers table, without the need to create a programmatic loop like you’ve created in your sample code. If you have 100,000,000 customer entries, the set-based update will be significantly faster than the procedural update. There are lots of useful resources online that you can read up about this. Here’s a SO link to get started: Why are relational set-based queries better than cursors?.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

We’ve just built a system that rolls up its data at midnight. It must

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply