This question applies to stored procedures written in SQL-92
i.e. Oracle PL/SQL, SQL Server T-SQL or DB2 SQL
I’m maintaining an 11000 line stored procedure.
I find that by the end of the stored procedure I need to report out 80 columns of data.
There are 3 distinct phases in this stored procedure.
- Data Gathering (copying data from the live tables into the stored procedure intermediate tables)
I need to do data gathering for consistency because the LIVE data (i.e. in the member table) at line 30 might have changed by the time the stored procedure execution gets to line 10,000
Commit state atomicity is maintained here (no commit until all the data needed is copied over) - Calculation (lots of SQL, complicated enough so that Cursors or Views will not do the job)
- Writing back to permanent tables (invoices, AR, payments)
Commit state atomicity is maintained here (no commit until all the data needed is copied over)
The “intermediate” tables are only used in the stored procedure.
They are indexed for joins down the line but do not have
PK/FK referential integrity constraints or unique indexes
as these would slow down execution considerably in addition
to pointing back to LIVE data (that is in flux)
When you get to 80 columns of data that you need to report by the end of a stored
procedure you run up against RDBMS limitations (index limits, memory limits,
SQL join COST limits, out of control paging and data going to virtual
memory/swap when the DB thinks it should use HASH instead of using NESTED LOOPS)
I have normalized LIVE data (that gets written to and read from 24/7 by data entry users)
It occurred to me that the way to optimize the space taken up by intermediate tables
used in the stored procedure (at step 2) would be to find composite primary keys and assign
each a unique id (a surrogate PK) thereby referencing n columns with 1 column. Then, I
would reconstitute this data at the end of step 2 and have it ready to
write back to at the start of step 3. This would add more processing to
step 2 but less data would get copied around. Also debugging would take
more steps (tracing back ids to actual data in the intermediate
table data after execution completes)
Has anyone run into this scenario with lengthy stored procedures?
Has anyone created a surrogate key (replacing a compound PK with a one-column PK) in
intermediate tables that are only used in stored procedures?
Has that paid off in terms of execution time and memory/space used during
execution?
I’ve built a few lenghty SProcs and I’ve always gone for an Indentity column surrogate key. Is it possible to rethink what’s being done and create separate temp tables for each intermediate step?
I’ve had to do this in the past. At the end, I “stitched” all the separate temp tables into my final output.