Suppose I have a table called Projects with a column called Budget with a standard B-Tree index. The table has 50,000 projects, and only 1% of them have a Budget of over one million. If I ran the SQL Query:
SELECT * From Projects WHERE Budget > 1000000;
The planner will use an index range scan on Budget to get the rows off the heap table. However, if I use the query:
SELECT * From Projects WHERE Budget > 50;
The planner will most likely do a sequential scan on the table, as it will know this query will end up returning most or all rows anyway and there’s no reason to load all the pages of the index into memory.
Now, let’s say I run the query:
SELECT * From Projects WHERE Budget > :budget;
Where :budget is a bind parameter passed into my database. From what I’ve read, the query as above will be cached, and no data on cardinality can be inferred. In fact, most databases will just assume an even distribution and the cached query plan will reflect that. This surprised me, as usually when you read about the benefits of bind parameters it’s on the subject of preventing SQL injection attacks.
Obviously, this could improve performance if the resulting query plan would be the same, as a new plan wouldn’t have to be compiled, but could also hurt performance if the values of :budget greatly varied.
My Question: Why are bind parameters not resolved before the query plan is generated and cached? Shouldn’t modern databases strive to generate the best plan for the query, which should mean looking at the value for each parameter and getting accurate index stats?
Note: This question probably doesn’t apply to mySql as mySql doesn’t cache SQL plans. However, I’m interested in why this is the case on Postgres, Oracle and MS SQL.
Don’t confuse parameterized queries with prepared statements. Both offer parameterization, but prepared statements offer the additional caching of the query plan.
Because sometimes generating the query plan is an expensive step. Prepared statements allow you to amortize the cost of query planning.
However, if all you’re looking for is SQL injection protection, don’t use prepared statements. Use parameterized queries.
For example, in PHP, you can use http://php.net/pg_query_params to execute a parameterized query WITHOUT caching the query plan; meanwhile http://php.net/pg_prepare and http://php.net/pg_execute are used to cache a plan for a prepared statement and later execute it.
Edit: 9.2 apparently changes the way prepared statements are planned