I’m building an activity stream, so old messages fall off the end and will never get seen again. So, I’d like to regularly delete old messages. However, I don’t want it to look like a user has no activity, so I want to leave a minimum of X posts in each stream.
Simplified, my “messages” table has a “mid” (message ID, primary key), a “uid” (user ID), a “created” date (UNIX timestamp), and a “message.” So what I want is to do something like this:
DELETE FROM messages
WHERE created < ? AND mid NOT IN (
SELECT mid FROM messages m GROUP BY uid HAVING mid > (
SELECT mid FROM messages WHERE uid = m.uid ORDER BY mid LIMIT 1 OFFSET ?
)
)
However, when I try the query inside NOT IN() by itself, it returns no results. Additionally, I don’t think it works to SELECT from the same table you’re deleting from.
I’m okay with 2 queries if necessary, but I hope it isn’t necessary.
I tried searching around and there are a bunch of results on how to delete all but the last record, but I didn’t find anything about deleting all but the last N records.
Look at this question for a [mysql] solution to the “top n per group” problem:
How to SELECT the newest four items per category?
Once you have the set consisting of all rows in the “top n per group” (that’s what you want to keep). You just have to delete those rows not in that set:
Here’s another link that looks at the same problem:
http://www.xaprb.com/blog/2006/12/07/how-to-select-the-firstleastmax-row-per-group-in-sql/