Ive got a problem where I want to delete orphaned records. I was wondering what the syntax is for deleting records that aren’t in a join.
So if my query to get the stuff (that I dont want to delete is):
select * from tbl_user tu
inner join tbl_user_group_xref tugx on tu.userid=tugx.userid
Then how do I
1) get the stuff that isnt in the clause and 2) delete it?
Like to do it without using arrays but an array solution would still be useful for learning purposes.
There’s an optimization to Duncan Howe’s answer that I know works in MySQL and may work with other servers. It probably also works for t-clausen.dk’s answer in MySQL.
If you are deleting rows from table t1 that don’t have corresponding rows in t2 and both tables are very large then the server can end up getting swamped with disk seeks. I found that performance can be improved a lot if you can force the server to load t2’s index into memory before running the query and then, in the query, force the server to ignore t1’s index. That makes the server do a sequential scan of t1, which will be an efficient use of disk. The server steps through each row of t1 looking up t2’s index, which is in memory, to determine if the row should be deleted. The disk seeks are thus eliminated and disk IO rate is very high, which keeps the CPU busy.
For example:
(I’m assuming that
tbl_user.useridis its table’s PK and the index ontbl_user_group_xref.useridis nameduserid. If not, change the respective key names.)Forcing a server to load an index into memory is technology-specific. In MySQL for MyISAM tables you can use
load index into cache. Recreating an index from scratch (which is very fast in MySQL) might leave it in cache (and would have the nice side effect of balancing the B-tree).I’ve seen examples with well over 100x improvement using this optimization. So long as you can cache t2’s index, you can process very large tables efficiently.