I am working on an ASP.NET MVC project which allows users to construct arbitrarily complex queries by adding items clause at a time.
The application then generates appropriate SQL, runs it (currently using SQL Server 2008) and displays the results, with a breakdown which includes the number of matching records for each added item.
e.g.
UserQuery:
Has ConditionA (45)
And ConditionB (33)
Or ConditionC (55)
And ConditionD (15)
Total: 48
The problem is how best to build and run the SQL to produce these results with performance and scalability in mind.
The initial implementation built a query (using subqueries) for each item (combined with the previous) in turn, running them separately as scalars. Each execution involved generating the SQL and opening new SqlConnection, creating new SqlCommand and executing.
I spent a while re-writing this to produce a single query (which uses CTEs) to return a single row with the result of each item as a column.
This only required a single execution and performance seemed marginally favourable until the queries became complex and SQL Server started throwing errors:
The query processor ran out of
internal resources and could not
produce a query plan
What would be the most scalable and efficient way of building and running such a query?
The way forward for us was to construct the query using temporary tables for each clause, where each subsequent clause added was applied (via Union/Intersection/Exception) to the temporary table resulting from the previous clause.
A temp table is also created for results and is updated with a TempTable Id and row count for each as they are populated.
When the query has been processed the results are returned by selecting all rows from the results temp table which gives a full item by item breakdown.
This prevented the need for gigantic SQL query statments featuring many many subqueries and also prevented constant re-execution of the same SQL, providing massive improvement in scalbility and performance.