Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 5972307
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 22, 20262026-05-22T20:38:22+00:00 2026-05-22T20:38:22+00:00

The schema I have a MySQL database with one large table (5 million rows

  • 0

The schema

I have a MySQL database with one large table (5 million rows say). This table has several fields for actual data, an optional comment field, and fields to record when the row was first added and when the data is deleted. To simplify to one “data” column, it looks a bit like this:

+----+------+---------+---------+----------+
| id | data | comment | created | deleted  |
+----+------+---------+---------+----------+
| 1  | val1 | NULL    | 1       | 2        |
| 2  | val2 | nice    | 1       | NULL     |
| 3  | val3 | NULL    | 2       | NULL     |
| 4  | val4 | NULL    | 2       | 3        |
| 5  | val5 | NULL    | 3       | NULL     |

This schema allows us to look at any past version of the data thanks to the created and deleted fields e.g.

SET @version=1;
SELECT data, comment FROM MyTable
WHERE created <= @version AND 
      (deleted IS NULL OR deleted > @version);

+------+---------+
| data | comment |
+------+---------+
| val1 | NULL    |
| val2 | nice    |

The current version of the data can be fetched more simply:

SELECT data, comment FROM MyTable WHERE deleted IS NULL;

+------+---------+
| data | comment |
+------+---------+
| val2 | nice    |
| val3 | NULL    |
| val5 | NULL    |

DDL:

CREATE TABLE `MyTable` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `data` varchar(32) NOT NULL,
  `comment` varchar(32) DEFAULT NULL,
  `created` int(11) NOT NULL,
  `deleted` int(11) DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `data` (`data`,`comment`)
) ENGINE=InnoDB;

Updating

Periodically a new set of data and comments arrives. This can be fairly large, half a million rows say. I need to update MyTable so that this new data set is stored in it. This means:

  • “Deleting” old rows. Note the “scare quotes” – we don’t actually delete rows from MyTable. We have to set the deleted field to the new version N. This has to be done for all rows in MyTable that are in the previous version N-1, but are not in the new set.
  • Inserting new rows. All rows that are in the new set and are not in version N-1 in MyTable must be added as new rows with the created field set to the new version N, and deleted as NULL.

Some rows in the new set may match existing rows in MyTable at version N-1 in which case there is nothing to do.

My current solution

Given that we have to “diff” two sets of data to work out the deletions, we can’t just read over the new data and do insertions as appropriate. I can’t think of a way to do the diff operation without dumping all the new data into a temporary table first. So my strategy goes like this:

-- temp table uses MyISAM for speed.
CREATE TEMPORARY TABLE tempUpdate (
    `data` char(32) NOT NULL,
    `comment` char(32) DEFAULT NULL,
    PRIMARY KEY (`data`),
    KEY (`data`, `comment`)
) ENGINE=MyISAM;

-- Bulk insert thousands of rows
INSERT INTO tempUpdate VALUES
    ('some new', NULL),
    ('other', 'comment'),
...

-- Start transaction for the update
BEGIN;
SET @newVersion = 5; -- Worked out out-of-band

-- Do the "deletions". The join selects all non-deleted rows in MyTable for
-- which the matching row in tempUpdate does not exist (tempUpdate.data is NULL)
UPDATE MyTable
    LEFT JOIN tempUpdate
    ON MyTable.data = tempUpdate.data AND
       MyTable.comment <=> tempUpdate.comment
    SET MyTable.deleted = @newVersion
    WHERE tempUpdate.data IS NULL AND
          MyTable.deleted IS NULL;

-- Delete all rows from the tempUpdate table that match rows in the current
-- version (deleted is null) to leave just new rows.
DELETE tempUpdate.*
    FROM MyTable RIGHT JOIN tempUpdate
    ON MyTable.data = tempUpdate.data AND
       MyTable.comment <=> tempUpdate.comment
    WHERE MyTable.id IS NOT NULL AND
          MyTable.deleted IS NULL;

-- All rows left in tempUpdate are new so add them.    
INSERT INTO MyTable (data, comment, created)
    SELECT DISTINCT tempUpdate.data, tempUpdate.comment, @newVersion
    FROM tempUpdate;

COMMIT;

DROP TEMPORARY TABLE IF EXISTS tempUpdate;

The question (at last)

I need to find the fastest way to do this update operation. I can’t change the schema for MyTable, so any solution must work with that constraint. Can you think of a faster way to do the update operation, or suggest speed-ups to my existing method?

I have a Python script for testing the timings of different update strategies and checking their correctness over several versions. It’s fairly long but I can edit into the question if people think it would be useful.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-22T20:38:23+00:00Added an answer on May 22, 2026 at 8:38 pm

    One of speed-ups is for loading — LOAD DATA INFILE.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

Given a table named person (in a MySQL database/schema), kind of like this one:
I have the following mysql table schema: SET SQL_MODE=NO_AUTO_VALUE_ON_ZERO; -- -- Database: `network` --
In my SQL Server database schema I have a data table with a date
I have no control over database schema and have the following (simplified) table structure:
I have one database in mysql. But when i log into phpMyAdmin , it
Have 2 MySQL databases. One is the main database, the other is used for
I have an existing MySQL database schema in production for an PHP5 application. The
We have a web-based application, backed by a MySQL database. One part of the
I have a MySQL innodb database at 1.9GB, showed by following command. SELECT table_schema
I have MySQL running with the ASP Membership Schema I have installed the connector,

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.