I come across several instances where I can write a query with using both joins or sub queries. I usually use joins but sometimes use sub queries (without any reason). I have read in several places (including stackoverflow) that joins are faster than sub queries in many instances but sometimes subqueries are faster. Right now the queries I am writing does not deal with large amount of data so I guess the speed isn’t much of a concern. But for future, I’m curious about the following.
a.) Why are joins faster than subqueries (in general).
b.) What are the instances when subqueries are faster. How will I know?
c.) If I’m writing a query, how should I judge whether I should use subquery or a join. I will appreciate if someone explains me with an example.
The answers to your questions.
a) Joins aren’t faster then subqueries (in general). But often DBMSs produce a much smarter execution plan if you use joins. This is related two the procedure how queries are transformed into execution plans.
b) c) In general there are no rules for writing fast queries. Furthermore, there is only one way to choose the correct query for your task: You have to benchmark the different versions. So if you have to decide how to formulate a certain query benchmark the first and if it performs good, then stop. Else change something and benchmark it again and if it is fine, then stop. Use an environment that is close to your production environment: use realistic datasets. A query might perform well with thousands of records but not with millions. Use the same hardware as in production. Consider to benchmark the query in the context of your application, since other queries of these may influence the performance of it.