I’m trying to figure out a way to gather a dataset without having to loop 700,000 mysql queries.
I have two tables
users with
id autoincrement,
time timestamp,
username varchar(200),
email varchar(100),
ip varchar(20)
and uniq_ip with
ip unique varchar(20),
most_recent datetime,
count (int)
users has 25 million rows and records the activity of users as they work on the site. uniq_ip has a list of all the IP numbers and how many times it’s listed in users (on trigger update).
At the moment, while daydream coding, I get a list of all the IPs from uniq_ip and loop them to get the most recent 2000 records for each of those IPs. As uniq_ip has 700,000 rows, this loop is really nasty, making 700,000 queries total, using
select * from users where ip = '$outerloopip' order by `time` desc limit 2000;
I’m trying to get a single query that will grab the most recent 2000 listings for each of the IPs. If 1.2.3.4 is listed 10,000 times, I just want the most recent 2000, based on the time field.
Any ideas how to do it in one query?
I’m sorry about previous answer and re-read and applied updated query. I missed and thought you wanted only most recent 2000 IP addresses. Anyhow, this one does ALL IP addresses and limits the total records per IP to 2,000 entries with most recent at the top. I would ensure you have an index on
(IP,TIME DESC)
Then, try this query. The critical thing I missed to clarify. The HAVING clause is applied AFTER any group-by or order-by clause. So the data is pre-returned in proper order of IP address and date/time DESCENDING, then the @sql variables are applied. Once the record is qualified and READY to be added to the final result set, the HAVING clause is applied. At THAT moment, it looks at the sequence counter and says… if its greater than 2000, throw it out and move on to the next record.
By my original query, it was saving everything, then cycling through a second time and kicking out those greater than 2000 which was probably why it was blowing your disk space away.