I have a recursive immutable data structure in ocaml which can be simplified to something like this:
type expr =
{
eexpr : expr_expr;
some_other_complex_field : a_complex_type;
}
and expr_expr =
| TInt of int
| TSum of (expr * expr)
| TMul of (expr * expr)
It’s an AST, and sometimes it gets pretty complex (it’s very deep).
there is a recursive function that evaluates an expression. For example, let’s say,
let rec result expr =
match expr.eexpr with
| TInt i -> i
| TSum (e1, e2) -> result e1 + result e2
| TMul (e1, e2) -> result e1 * result e2
Now suppose I am mapping an expression to another expression, and I need to constantly check the result of an expr, sometimes more than once for the same expr, and sometimes for expressions that were recently mapped by using the pattern
{ someExpr with eexpr = TSum(someExpr, otherExpr) }
Now, the result function is very lightweight, but running it many times for a deep AST will not be very optimized. I know I could cache the value using a Hashtbl, but AFAIK the Hashtbl will only do structural equality, so it will need to traverse my long AST anyway.
I know the best option would be to include a probably immutable “result” field in the expr type. But I can’t.
So is there any way in Ocaml to cache a value to an immutable type, so I don’t have to calculate it eagerly every time I need it ?
Thanks!
You can use the functorial interface to control the kind of equality used by the hash table. I believe the semantics of (==) are legitimate for your purposes; i.e., if A == B then f A = f B for any pure function f. So you can cache the results of f A. Then if you find a B that’s physically equal to A, the cached value is correct for B.
The downside of using (==) for hashing is that the hash function will send all structurally equal objects to the same hash bucket, where they will be treated as distinct objects. If you have a lot of structurally equal objects in the table, you get no benefit from the hashing. The behavior degenerates to a linear search.
You can’t define the hash function to work with physical addresses, because the physical addresses can be changed at any time by the garbage collector.
However, if you know your table will only contain relatively few large-ish values, using physical equality might work for you.