I’m running a MYSQL query in two steps. First, I get a list of ids with one query, and then I retrieve the data for those ids using a second query along the lines of SELECT * FROM data WHERE id in (id1, id2 ...). I know it sounds hacky, but I’ve done it this way as the queries are very complicated; the first involves lots of geometry and triggernometry, the second one lots of different joins. I’m sure they could be written in a single query, but my MYSQL isn’t good enough to pull it off.
This approach works, but it doesn’t feel right; plus I’m concerned it won’t scale. At the moment I am testing on a database of 10,000 records, with 400 ids in the “IN” clause ( i.e. IN (id1, id2 ... id400) ) and performance is fine. But what if there are say 1,000,000 records?
Where are the performance bottlenecks (speed, memory, etc) for this kind of query? Any ideas for how to refactor this kind of query for be awesome too. (for example, if it is worth swotting up on stored procedures).
Starting from a certain number of records, the
INpredicate over aSELECTbecomes faster than that over a list of constants.See this article in my blog for performance comparison:
If the column used in the query in the
INclause is indexed, like this:, then this query is just optimized to an
EXISTS(which uses but a one entry for each record fromtable1)Unfortunately,
MySQLis not capable of doingHASH SEMI JOINorMERGE SEMI JOINwhich are yet more efficient (especially if both columns are indexed).