I’ve come across a scenario where I need to return a complex set of calculated values at a crossover point from “legacy” to current.
To cut a long story short I have something like this …
with someofit as
(
select id, col1, col2, col3 from table1
)
select someofit.*,
case when id < @lastLegacyId then
(select ... from table2 where something = id) as 'bla'
,(select ... from table2 where something = id) as 'foo'
,(select ... from table2 where something = id) as 'bar'
else
(select ... from table3 where something = id) as 'bla'
,(select ... from table3 where something = id) as 'foo'
,(select ... from table3 where something = id) as 'bar'
end
from someofit
No here lies the problem …
I don’t want to be constantly doing that case check for each sub selection but at the same time when that condition applies I need all of the selections within the relevant case block.
Is there a smarter way to do this?
if I was in a proper OO language I would use something like this …
var common = GetCommonSuff()
foreach (object item in common)
{
if(item.id <= lastLegacyId)
{
AppendLegacyValuesTo(item);
}
else
{
AppendCurrentValuesTo(item);
}
}
I did initially try doing 2 complete selections with a union all but this doesn’t work very well due to efficiency / number of rows to be evaluated.
The sub selections are looking for total row counts where some condition is met other than the id match on either table 2 or 3 but those tables may have millions of rows in them.
The cte is used for 2 reasons …
firstly it pulls only the rows from table 1 i am interested in so straight away im only doing a fraction of the sub selections in each case.
secondly its returning the common stuff in a single lookup on table 1
Any ideas?
EDIT 1 :
Some context to the situation …
I have a table called “imports” (table 1 above) this represents an import job where we take data from a file (csv or similar) and pull the records in to the db.
I then have a table called “steps” this represents the processing / cleaning rules we go through and each record contains a sproc name and a bunch of other stuff about the rule.
There is then a join table that represents the rule for a particular import “ImportSteps” (table 2 above – for current data), this contains a “rowsaffected” column and the import id
so for the current jobs my sql is quite simple …
select 123 456
from imports
join importsteps
for the older legacy stuff however I have to look through table 3 … table 3 is the holding table, it contains every record ever imported, each row has an import id and each row contains key values.
on the new data rowsaffected on table 2 for import id x where step id is y will return my value.
on the legacy data i have to count the rows in holding where col z = something
i need data on about 20 imports and this data is bound to a “datagrid” on my mvc web app (if that makes any difference)
the cte i use determines through some parameters the “current 20 im interested in” those params represent start and end record (ordered by import id).
My biggest issue is that holding table … it’s massive .. individual jobs have been known to contain 500k + records on their own and this table holds years of imported rows so i need my lookups on that table to be as fast as possible and as few as possible.
EDIT 2:
The actual solution (suedo code only) …
-- declare and populate the subset to reduce reads on the big holding table
declare table @holding ( ... )
insert into @holding
select .. from holding
select
... common stuff from inner select in "from" below
... bunch of ...
case when id < @legacy then (select getNewValue(id, stepid))
else (select x from @holding where id = ID and ... ) end as 'bla'
from
(
select ROW_NUMBER() over (order by importid desc) as 'RowNum'
, ...
) as I
-- this bit handles the paging
where RowNum >= @StartIndex
and RowNum < @EndIndex
i’m still confident i can clean it up more but my original query that looked something like bills solution was about 45 seconds in execution time, this is about 7
I take it the subqueries must return a single scalar value, correct? This point is important because it is what ensures the LEFT JOINs will not multiply the result.
Beware that I have used
id >= @lastLegacyIdas the complement of the condition, by assuming that id is not nullable. If it is, you need an IsNull there, i.e.somefit.id >= isnull(@lastLegacyId,somefit.id).Your edit to the question doesn’t change the fact that this is an almost literal translation of the O-O syntax.
Now, if you have actually tried this and it doesn’t solve your problem, I’d like to know where it broke.