Hi I have a huge unnormalized mysql database with (~100 million) urls (~20% dupes)

Question

0

Asked: June 12, 20262026-06-12T10:14:19+00:00 2026-06-12T10:14:19+00:00

Hi I have a huge unnormalized mysql database with (~100 million) urls (~20% dupes)

0

Hi I have a huge unnormalized mysql database with (~100 million) urls (~20% dupes) divided into identical split tables of 13 million rows each.

I want to move the urls into a normalized database on the same mySql server.

The old database table is unnormalized, and the url’s have no index
It look like this:

entry{id,data,data2, data3, data4, possition,rang,url}

And i’m goin to slit it up into multiple tables.

url{id,url}
data{id,data}
data1{id,data}
etc

The first thing I did was

INSERT IGNORE INTO newDatabase.url (url)
SELECT DISTINCT unNormalised.url FROM oldDatabase.unNormalised

But the ” SELECT DISTINCT unNormalised.url” (13 million rows) took ages, and I figured that that since “INSERT IGNORE INTO” also do a comparison, it would be fast to just do a

INSERT IGNORE INTO newDatabase.url (url)
SELECT unNormalised.url FROM oldDatabase.unNormalised

Without the DISTINCT, is this assumption Wrong?

Any way it still takes forever and i need some help, is there a better way of dealing withe this huge quantity of unnormalized data?
Whould it be best if i did a SELECT DISTINCT unNormalised.url” on the entire 100 milion row database, and exported all the id’s, and then moved only those id’s to the new database with lets say a php script?

All ideas are welcomed, i have no clue how to port all this date without it taking a year!

ps it is hosted on a rds amazon server.

Thank you!

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-12T10:14:20+00:00

As the MySQL Manual states that LOAD DATA INFILE is quicker than INSERT, the fastest way to load your data would be:

LOCK TABLES url WRITE;
ALTER TABLE url DISABLE KEYS;
LOAD DATA INFILE 'urls.txt'
    IGNORE
    INTO TABLE url
    ...;
ALTER TABLE url ENABLE KEYS;
UNLOCK TABLES;

But since you already have the data loaded into MySQL, but just need to normalize it, you might try:

LOCK TABLES url WRITE;
ALTER TABLE url DISABLE KEYS;
INSERT IGNORE INTO url (url)
    SELECT url FROM oldDatabase.unNormalised;
ALTER TABLE url ENABLE KEYS;
UNLOCK TABLES;

My guess is that INSERT IGNORE ... SELECT will be faster than INSERT IGNORE ... SELECT DISTINCT but that’s just a guess.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

Hi I have a huge unnormalized mysql database with (~100 million) urls (~20% dupes)

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply