I’m experiencing some interesting behaviour with using user-defined functions within a SELECT statement.
I have a couple stored procedures that read and purge data from a single table. These stored procedures are used by multiple sources.
In my observations, it appears that the user defined functions are sometimes evaluated arbitrarily, not always immediately after or during the execution of the SELECT statement that it is used in.
For example, in a stored procedure, I have a select statement might look something like this:
SELECT Something, MyFunction(Something) FROM Somewhere;
This is followed by a call to another stored procedure, which purges data from the table. The amount of data purged is governed by another table, which stores maximum ID read. This is so that a purge should not delete any data that has not yet been read by another instance of the stored procedure executing.
In my test code, MyFunction just returns the number of rows in the table Somewhere. Thus, I would imagine that it should always be equal to the number of rows that the SELECT statement returns. However, in cases where I run two instances of this stored procedure, I get results something like this:
First query instance:
Something MyFunction(Something)
--------- ---------------------
A 3
B 3
C 3
Second query instance:
Something MyFunction(Something)
--------- ---------------------
A 0
B 0
C 0
Why is it that the second query returns all rows, but the user defined function that operates on the same table reports that there are no more rows in the table?
Is there anyway that I can ensure that the second query instance is consistent in that the user defined functions still see the same data that the parent stored procedure is seeing?
In general, the problem you are seeing is due to the fact that while Oracle’s multi-version read consistency ensures that a single SQL statement will always see a consistent view of the data, that same consistency does not mean that every SQL statement issued by a function called by the original SQL statement will see the same set of data that the original statement does.
In practical terms, that means that something like
will always return the correct answer (3 if the query returns 3 rows), if you put exactly the same logic in a function
that the SQL statement
will not necessarily return a value that matches the number of rows in the table (nor will it necessarily return the same result for every row). You can see that in action if you build in a delay to your function so that you can modify the data in a separate session. For example
Create a function that adds a 10 second delay per row
Now, if in session 1, I start the statement
then switch over to session 2 where I insert a new row
you can see that the function sees the newly committed row during the second execution despite the fact that the SQL statement itself only sees 3 rows
You can avoid that problem by having your session use the serializable transaction isolation level before executing the SQL statement. So, for example,
In session 1, set the transaction isolation level to serializable and start the query
In session 2, insert a new row
and when Session 1 returns 40 seconds later, everything is consistent