Hi I have a huge unnormalized mysql database with (~100 million) urls (~20% dupes) divided into identical split tables of 13 million rows each.
I want to move the urls into a normalized database on the same mySql server.
The old database table is unnormalized, and the url’s have no index
It look like this:
entry{id,data,data2, data3, data4, possition,rang,url}
And i’m goin to slit it up into multiple tables.
url{id,url}
data{id,data}
data1{id,data}
etc
The first thing I did was
INSERT IGNORE INTO newDatabase.url (url)
SELECT DISTINCT unNormalised.url FROM oldDatabase.unNormalised
But the ” SELECT DISTINCT unNormalised.url” (13 million rows) took ages, and I figured that that since “INSERT IGNORE INTO” also do a comparison, it would be fast to just do a
INSERT IGNORE INTO newDatabase.url (url)
SELECT unNormalised.url FROM oldDatabase.unNormalised
Without the DISTINCT, is this assumption Wrong?
Any way it still takes forever and i need some help, is there a better way of dealing withe this huge quantity of unnormalized data?
Whould it be best if i did a SELECT DISTINCT unNormalised.url” on the entire 100 milion row database, and exported all the id’s, and then moved only those id’s to the new database with lets say a php script?
All ideas are welcomed, i have no clue how to port all this date without it taking a year!
ps it is hosted on a rds amazon server.
Thank you!
As the MySQL Manual states that
LOAD DATA INFILEis quicker thanINSERT, the fastest way to load your data would be:But since you already have the data loaded into MySQL, but just need to normalize it, you might try:
My guess is that
INSERT IGNORE ... SELECTwill be faster thanINSERT IGNORE ... SELECT DISTINCTbut that’s just a guess.