I’m trying to optimise a complex query in PostgreSQL 9.1.2, which calls some functions. These functions are marked STABLE or IMMUTABLE and are called several times with the same arguments in the query. I assumed PostgreSQL would be smart enough to only call them once for each set of inputs – after all, that’s the point of STABLE and IMMUTABLE, isn’t it? But it appears that the functions are being called multiple times. I wrote a simple function to test this, which confirms it:
CREATE OR REPLACE FUNCTION test_multi_calls1(one integer)
RETURNS integer
AS $BODY$
BEGIN
RAISE NOTICE 'Called with %', one;
RETURN one;
END;
$BODY$ LANGUAGE plpgsql IMMUTABLE;
WITH data AS
(
SELECT 10 AS num
UNION ALL SELECT 10
UNION ALL SELECT 20
)
SELECT test_multi_calls1(num)
FROM data;
Output:
NOTICE: Called with 10
NOTICE: Called with 10
NOTICE: Called with 20
Why is this happening and how can I get it to only execute the function once?
The following extension of your test code is informative:
OUTPUT:
Here we can see that while in the select-list the immutable function was called multiple times, in the where clause it was called once, while the volatile was called thrice.
The important thing isn’t that PostgreSQL will only call a
STABLEorIMMUTABLEfunction once with the same data – your example clearly shows that this is not the case – it’s that it may call it only once. Or perhaps it will call it twice when it would have to call a volatile version 50 times, and so on.There are different ways in which stability and immutability can be taken advantage of, with different costs and benefits. To provide the sort of saving you are suggesting it should make with select-lists it would have to cache the results, and then lookup each argument (or list of arguments) in this cache before either returning the cached result or calling function on a cache-miss. This would be more expensive than calling your function, even in the case where there was a high percentage of cache-hits (there could be 0% cache hits meaning this “optimisation” did extra work for absolutely no gain). It could store maybe just the last parameter and result, but again that could be completely useless.
This is especially so considering that stable and immutable functions are often the lightest functions.
With the where clause however, the immutability of
test_multi_calls1allows PostgreSQL to actually restructure the query from the plain meaning of the SQL given:To a different query plan entirely:
This is the sort of use that PostgreSQL makes of STABLE and IMMUTABLE – not the caching of results, but the rewriting of queries into different queries which are more efficient but give the same results.
Note also that test_multi_calls1(30) is called before test_multi_calls2(40) no matter what order they appear in the where clause. This means that if the first call results in no rows being returned (replace
= 30with= 31to test) then the volatile function won’t be called at all – again regardless to which is on which side of theand.This particular sort of rewriting depends upon immutability or stability. With
where test_multi_calls1(30) != numquery re-writing will happen for immutable but not for merely stable functions. Withwhere test_multi_calls1(num) != 30it won’t happen at all (multiple calls) though there are other optimisations possible:Expressions containing only STABLE and IMMUTABLE functions can be used with index scans. Expressions containing VOLATILE functions cannot. The number of calls may or may not decrease, but much more importantly the results of the calls will then be used in a much more efficient way in the rest of the query (only really matters on large tables, but then it can make a massive difference).
In all, don’t think of volatility categories in terms of memoisation, but rather in terms of giving PostgreSQL’s query planner opportunities to restructure entire queries in ways that are logically equivalent (same results) but much more efficient.