I’m trying to generate 16 000 000 unique random numbers (10-digits: range 1 000 000 000 – 9 999 999 999) and insert them into an empty table (or fill this table up if not empty).
The table:
CREATE TABLE `codes` (
`code_id` bigint(20) UNSIGNED NOT NULL AUTO_INCREMENT,
`code` bigint(20) UNSIGNED NOT NULL,
`is_used` tinyint(1) NOT NULL DEFAULT '0',
PRIMARY KEY (`code_id`),
UNIQUE KEY `code` (`code`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 AUTO_INCREMENT=1 ;
…and the function:
DELIMITER $$
CREATE DEFINER=`root`@`localhost` FUNCTION `codes`(`minRange` BIGINT UNSIGNED, `maxRange` BIGINT UNSIGNED, `_amount` BIGINT UNSIGNED) RETURNS tinyint(1)
MODIFIES SQL DATA
BEGIN
DECLARE pick BIGINT;
while (SELECT count(*) FROM codes) < _amount do
begin
SET pick = minRange + FLOOR(RAND() * (maxRange - minRange + 1));
INSERT IGNORE INTO codes (code) VALUES (pick);
end;
end while;
RETURN 1;
END$$
DELIMITER ;
-- call: SELECT codes(1000000000,9999999999,16000000);
The function is extremaly slow: generating 20k rows takes 2.5 min., so 16M – about 33 hours…
Is there any way to optimise it?
What you are asking for is an oxymoron. If the 16M numbers are unique then they aren’t random. Think about it: a truly 10 digit random number has a prob of 1/9E9 being any given number. Yet your 16M’th number has 0 prob of being one of 15,999,999 numbers and a 1/983,000,001 prob of being one of the remaining ones. You should always expect duplicates. There are roughly 1% for 16M out of 1B.
So my suggestion is to generate (16M + a bit) random numbers, do a unique sort; truncate to 16M then random sort them. My Q&D to do this:
took just over 7 mins on my 4 year old laptop. It will be a lot faster on a modern quad-core desktop. I used PHP-CLI because I have it on my laptop and the mt_rand routine is a good one. You could do this as a 3 liner-C prog or any language to hand. As Zercms says, the load will be quick as long as disable indexes before the load and reenable after. You also only need to load the column
code.Enjoy.