We have a very complex Oracle package here (over 4,100 lines of code) that’s giving us issues. I’ve been tasked to track down the problem. The issue is, when we call the execute_filter procedure, we expect to get back 6 tables. However, around 10-15 times a day our code crashes because table index 1 is out of range. When this repros, it seems to repro several times then a minute later, it works great again. I’ve still yet to be able to repro this under a debugger to see exactly what the data set is – but I have a theory the dataset is just a single empty table.
Digging through the Oracle package is almost impossible, as it’s all one big run-on query with no formatting, no indention, and pages and pages of code that builds other queries by concatenating strings and what not. However, I have a theory about what’s going on.
The execute_filter method calls one or more of dozens of other methods, for example filter_by_areas_name. Each of these methods queries some data and inserts this data into a table called tpm_temp_filter_project. An example of this is:
FOR I IN 1..areaState.COUNT LOOP
INSERT INTO tpm_temp_filter_project
(
projectid,
versionid
)
SELECT .. --Grabs the data it needs from other tables
At the end of each of these filter calls, we call a procedure called populate_result_table which copies stuff in tpm_temp_filter_project into another table and then does:
EXECUTE IMMEDIATE 'truncate table tpm_temp_filter_project';
So, my theory is that if two people run this query at the same time, the rows from these “holder” tables are getting truncated prematurely while another query still needs them.
What’s the best way to prevent this sort of thing from happening? One idea I had would be to put:
LOCK TABLE tpm_temp_filter_project IN EXCLUSIVE MODE;
At the very beginning of execute_filter, and a COMMIT; as the very last line. In theory, this should only allow one person to run the command at the same time, and pending requests will “block” until the first filter is done. I haven’t tried this yet, but I have a few questions.
- Is this a good theory as to what’s going on?
- Is this a good fix, or is there a better solution to this issue?
I appreciate any insight into this problem.
UPDATE:
Here’s the schema for the temp table:
CREATE GLOBAL TEMPORARY TABLE TPMDBO.TPM_TEMP_FILTER_PROJECT (
PROJECTID NUMBER NULL,
VERSIONID NUMBER NULL
)
ON COMMIT DELETE ROWS
ANOTHER UPDATE:
This does NOT appear to be a conflict between two sessions. If I change:
EXECUTE IMMEDIATE 'truncate table tpm_temp_filter_project';
to:
DELETE tpm_temp_filter_project;
then the error still occurs. Even if I comment out that line completely, the error still eventually occurs. There is nothing else in that package body that deletes, truncates, or modifies any other data what so ever.
Second piece of evidence – I finally did repro the error under the Visual Studio debugger. The DataSet in .NET is completely empty. There’s one table called table that has zero columns. If this was an issue with one session deleting data in these temp tables, then I would expect a valid schema with zero rows or perhaps rows from the wrong session.
The issue ended up being due to package state being reset intermittently. After several days of debugging (as the issue only repro’ed on production servers), I finally found the cause.
A procedure was being called in code which stored some data in a local variable. After that, some C# code got ran that internally, ended up calling
Open()on the database connection again (even though the connection was already opened). Rather than no-op, callingOpen()again seems to close and re-open the connection to the database – at least with the Oracle drivers we use. 99 out of 100 times, it would just choose the same connection from the connection pool and continue to work fine. However, every so often it would choose a different connection and our session ID would change, and the package state would get lost.Commenting out that
Open()call fixed the problem immediately.