Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 9092827
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 16, 20262026-06-16T22:56:03+00:00 2026-06-16T22:56:03+00:00

I have this table with more than 7 million rows and I am LOAD

  • 0

I have this table with more than 7 million rows and I am LOAD DATA LOCAL INFILE‘ing more data in the order of 0.5 million rows at a time into it. The first few times were fast, but this addition is taking increasingly long, probably due to indexing overhead:

CREATE TABLE `orthograph_ests` (
  `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  `digest` char(32) NOT NULL,
  `taxid` int(10) unsigned NOT NULL,
  `date` int(10) unsigned DEFAULT NULL,
  `header` varchar(255) NOT NULL,
  `sequence` mediumblob,
  PRIMARY KEY (`id`),
  UNIQUE KEY `digest` (`digest`),
  KEY `taxid` (`taxid`),
  KEY `header` (`header`)
) ENGINE=InnoDB AUTO_INCREMENT=12134266 DEFAULT CHARSET=latin1

I am developing an application that will run on pre-existing databases. I most likely have no control over server variables unless I make changes to them mandatory (which I would prefer not to), so I’m afraid suggestions like these are of limited use.

I have read that minimizing keys on this table will help. However, I need those keys for later queries. I’m guessing that if I drop and re-create them would take very long as well, but I have not tested this. I have also read that especially the UNIQUE constraint makes the insertion slow. The digest column will take SHA256 digests that must be unique, and I can’t make sure there is no collision (very unlikely, I know, but possible).

Would partitioning help, as suggested here? Could I improve the indexing, e.g., by limiting the key length on the digest column? Should I change to MyISAM, which supports DISABLE KEYS during transcactions? What else could I do to improve LOAD DATA performance?

Edit:

After the large insertion, this table is used for SELECTs only, no more writes. This large loading is mostly a once-and-done operation, however about 1,000 datasets (of each 0.5M rows) need to be uploaded before this is finished.

I will be using the digest to look up rows, which is why I indexed that column. If there should be a collision, that individual row should not be uploaded.

Putting the sequence blob in an external file system is probably not a viable option since I cannot easily impose file system changes on the users.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-16T22:56:04+00:00Added an answer on June 16, 2026 at 10:56 pm

    This is indeed a large amount of data you are loading, and you should expect it to take many dozens of hours, especially on general purpose shared server hardware. There’s very little magic (unless you work at Google or something) that will make this job anything but a big pain in the neck. So have courage.

    It’s a reference table. That means you should immediately switch to MyISAM and stay there for this table. You don’t need InnoDB’s transactional integrity features, but you do need MyISAM to disable indexing during loading and re-enable it afterward. Re-enabling indexing will take a long time, so be prepared for that.

    You should consider using a shorter hash than SHA-256. SHA-1 (160 bits) is good. Believe it or not, MD-5 (128 bits) may also serve. MD-5 has been cracked so it’s not suitable for secure content authentication. But it’s still a useful hash. A shorter hash is a better hash from your perspective.

    If you can disable indexing MyISAM style, it probably doesn’t matter much whether your digest key is unique. But you might consider allowing it to be non-unique to save time.

    It’s hard to make a suggestion about partitioning without knowing more about your data and your server hardware. But considering this is a reference database, it seems like it might be wise just to bite the bullet for a couple of weeks and get it loaded.

    If you have plenty of server disk space, you might consider loading each half-megarow chunk into its own table, then inserting it into the big table. That might prove to be a good way to deal with the possibility that you might have to reload the whole thing some day.

    On shared server hardware, it might make sense to use smaller chunks than half a megarow.

    You might consider making a separate id / digest table. Then you could load your data without the digests and get it done quickly. Then you could write yourself a stored procedure or client that would create the digests in batches of a few thousand rows each until they were done. This only works if the stuff being digested is in your dataset.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have a table with more than 9 rows. If I do this :
I have a table with more than a millon rows. This table is used
I have a huge table with more than one hundred million of rows and
Say I have an Order table that has 100+ columns and 1 million rows.
I have a table that contains approx 10 million rows. This table is periodically
I have a database with this kind of a table, has more than 10
I have a table which has more than 380 million records. I have a
I have table that contains more than 12 millions of rows. I need to
I currently have a MySQL table of about 20 million rows, and I need
I have a denormalized table product with about 6 million rows (~ 2GB) mainly

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.