I have a requirement to calculate the installed base for units with different placements/shipments in different countries with different “environments” over many years given a set of certain “retirement rates” assigned to each unit. The placements, curve definitions, and curve assignments are stored in different database tables (with DDL and sample data below, also on SQLFiddle.com). The formula for calculating installed base is as follows:

where 1990 is the first year for which we have placement data.
The problem:
Doing these calculations with datasets of 3 to 16 million rows of unit/country/environment/year placement combinations takes much more time than the target load/calculation time of 30 seconds to 1 minute.
Sql Server approach
When PIVOTed so that each year becomes its own column, I get anywhere from 100,000 t0 400,000 returned rows of raw data (placements + rates), which takes about 8-15 seconds. However, if I were to calculate this manually via SQL statement as included below, it takes at least 10 minutes.
We’ve also tried an SQL trigger solution that updated the installed base each time a placement or rate was modified, but that made database updates unreasonably slow on batch updates, and was also unreliable. I suppose this could merit more investigation if this were really the best option.
Excel-VSTO approach (so far, the fastest approach):
This data ultimately ends up in a C# VSTO powered Excel workbook where the data was calculated via a series of VLOOKUPs, but when loading 150,000 placements across 6 years by about 20 VLOOKUPs per cell (about 20 million VLOOKUPs), Excel crashes. When the VLOOKUPs are done in smaller batches and the formulas are converted into values, it doesn’t crash but it still takes much longer than one minute to calculate.
The question:
Is there some mathematical or programmatic construct that would help me to calculate this data via C# or SQL more efficiently than I’ve been doing? Brute force iteration is also too slow, so that’s not an option either.
DECLARE @Placements TABLE
(
UnitId int not null,
Environment varchar(50) not null,
Country varchar(100) not null,
YearColumn smallint not null,
Placement decimal(18,2) not null,
PRIMARY KEY (UnitId, Environment, Country, YearColumn)
)
DECLARE @CurveAssignments TABLE
(
UnitId int not null,
Environment varchar(50) not null,
Country varchar(100) not null,
YearColumn smallint not null,
RateId int not null,
PRIMARY KEY (UnitId, Environment, Country, YearColumn)
)
DECLARE @CurveDefinitions TABLE
(
RateId int not null,
YearOffset int not null,
Rate decimal(18,2) not null,
PRIMARY KEY (RateId, YearOffset)
)
INSERT INTO
@Placements
(
UnitId,
Country,
YearColumn,
Environment,
Placement
)
VALUES
(
1,
'United States',
1991,
'Windows',
100
),
(
1,
'United States',
1990,
'Windows',
100
)
INSERT INTO
@CurveAssignments
(
UnitId,
Country,
YearColumn,
Environment,
RateId
)
VALUES
(
1,
'United States',
1991,
'Windows',
1
)
INSERT INTO
@CurveDefinitions
(
RateId,
YearOffset,
Rate
)
VALUES
(
1,
0,
1
),
(
1,
1,
0.5
)
SELECT
P.UnitId,
P.Country,
P.YearColumn,
P.Placement *
(
SELECT
Rate
FROM
@CurveDefinitions CD
INNER JOIN @CurveAssignments CA ON
CD.RateId = CA.RateId
WHERE
CA.UnitId = P.UnitId
AND CA.Environment = P.Environment
AND CA.Country = P.Country
AND CA.YearColumn = P.YearColumn - 0
AND CD.YearOffset = 0
)
+
(
SELECT
Placement
FROM
@Placements PP
WHERE
PP.UnitId = P.UnitId
AND PP.Environment = P.Environment
AND PP.Country = P.Country
AND PP.YearColumn = P.YearColumn - 1
)
*
(
SELECT
Rate
FROM
@CurveDefinitions CD
INNER JOIN @CurveAssignments CA ON
CD.RateId = CA.RateId
WHERE
CA.UnitId = P.UnitId
AND CA.Environment = P.Environment
AND CA.Country = P.Country
AND CA.YearColumn = P.YearColumn
AND CD.YearOffset = 1
) [Installed Base - 1993]
FROM
@Placements P
WHERE
P.UnitId = 1
AND P.Country = 'United States'
AND P.YearColumn = 1991
AND P.Environment = 'Windows'
Looks like this might turn out to be a case where asking the question leads to the right answer. It turns out the answer mostly lies in the query I’d given above, which was entirely inefficient. I’ve been able to get load times in the vicinity that I’m looking for by just optimizing the query as below.