I know how function comparison works in Python 3 (just comparing address in memory), and I understand why.
I also understand that “true” comparison (do functions f and g return the same result given the same arguments, for any arguments?) is practically impossible.
I am looking for something in between. I want the comparison to work on the simplest cases of identical functions, and possibly some less trivial ones:
lambda x : x == lambda x : x # True
lambda x : 2 * x == lambda y : 2 * y # True
lambda x : 2 * x == lambda x : x * 2 # True or False is fine, but must be stable
lambda x : 2 * x == lambda x : x + x # True or False is fine, but must be stable
Note that I’m interested in solving this problem for anonymous functions (lambda), but wouldn’t mind if the solution also works for named functions.
The motivation for this is that inside blist module, it would be nice to verify that two sortedset instances have the same sort function before performing a union, etc. on them.
Named functions are of less interest because I can assume them to be different when they are not identical. After all, suppose someone created two sortedsets with a named function in the key argument. If they intend these instances to be “compatible” for the purposes of set operations, they’d probably use the same function, rather than two separate named functions that perform identical operations.
I can only think of three approaches. All of them seem hard, so any ideas appreciated.
-
Comparing bytecodes might work but it might be annoying that it’s implementation dependent (and hence the code that worked on one Python breaks on another).
-
Comparing tokenized source code seems reasonable and portable. Of course, it’s less powerful (since identical functions are more likely to be rejected).
-
A solid heuristic borrowed from some symbolic computation textbook is theoretically the best approach. It might seem too heavy for my purpose, but it actually could be a good fit since lambda functions are usually tiny and so it would run fast.
EDIT
A more complicated example, based on the comment by @delnan:
# global variable
fields = ['id', 'name']
def my_function():
global fields
s1 = sortedset(key = lambda x : x[fields[0].lower()])
# some intervening code here
# ...
s2 = sortedset(key = lambda x : x[fields[0].lower()])
Would I expect the key functions for s1 and s2 to evaluate as equal?
If the intervening code contains any function call at all, the value of fields may be modified, resulting in different key functions for s1 and s2. Since we clearly won’t be doing control flow analysis to solve this problem, it’s clear that we have to evaluate these two lambda functions as different, if we are trying to perform this evaluation before runtime. (Even if fields wasn’t global, it might have been had another name bound to it, etc.) This would severely curtail the usefulness of this whole exercise, since few lambda functions would have no dependence on the environment.
EDIT 2:
I realized it’s very important to compare the function objects as they exist in runtime. Without that, all the functions that depend on variables from outer scope cannot be compared; and most useful functions do have such dependencies. Considered in runtime, all functions with the same signature are comparable in a clean, logical way, regardless of what they depend on, whether they are impure, etc.
As a result, I need not just the bytecode but also the global state as of the time the function object was created (presumably __globals__). Then I have to match all variables from outer scope to the values from __globals__.
Edited to check whether external state will affect the sorting function as well as if the two functions are equivalent.
I hacked up
dis.disand friends to output to a global file-like object. I then stripped out line numbers and normalized variable names (without touching constants) and compared the result.You could clean this up so
dis.disand friendsyielded out lines so you wouldn’t have to trap their output. But this is a working proof-of-concept for usingdis.disfor function comparison with minimal changes.The output is
True,True,False, andFalseand is stable. The “Polluted” bool is true if the output will depend on external state — either global state or a closure.