So I was thinking about a graph distance algorithm tonight, and came up with this
while I was driving in the car:
module GraphDistance where
import Data.Map
distance :: (Ord a) => a -> Map a [a] -> Map a Int
distance a m = m'
where m' = mapWithKey d m
d a' as = if a == a' then 0 else 1 + minimum (Prelude.map (m'!) as)
At first, I was fairly proud of myself, since it seemed so elegant.
But then I realized it wouldn’t work – the corecursion might get stuck
in a loop.
I had to code it up to convince myself:
ghci> distance 1 $ fromList [(0,[1]),(1,[0,2]),(2,[1,3]),(3,[2])]
fromList [(0,1),(1,0),(2,^CInterrupted.
But now I think I’ve pretty much thought it through.
Is there a list of common errors and anti-patterns
when dealing with corecursive data that I can go read up on,
so I can train my brain to think corecursively? Experience
has trained me fairly well for thinking through non-corecursive
data, but I’d like to learn from others mistakes if I can.
Well, there’s really only one fundamental mistake when dealing with corecursive data, and that’s carelessly using recursion on it!
Corecursion implies that data is being generated incrementally in some sense. Your graph distance function is instructive here, because it seems like it should work, so think about where the incremental part should be: The starting point is a distance of 0 from a node to itself, otherwise one greater than the minimum distance among its own immediate neighbors. Thus, we would expect each distance value to be incremental, which means we need them to be suitably corecursive themselves.
The recursion at issue, then, is occurring due to the combination of
(+)andminimum: when finding the minimum,1will always be less than1 + n, without needing to worry about whatnis… but there’s no way to compareInts without computing the entire value.In short, the algorithm expects to be able to compare how many times
(1 +)was applied to0only as far as needed; that is to say, it wants lazy natural numbers defined using “zero” and “successor”.Behold:
And then in GHCi:
Proof that your algorithm works[0]; your implementation of it was just incorrect.
Now, as a slight variation, let’s apply your algorithm to a different graph:
…what do we expect this to do? What is the distance from node 1 for nodes 2 or 3?
Running it in GHCi has the obvious result:
Nevertheless, the algorithm works correctly on this graph. Can you see the problem? Why did it hang in GHCi?
In summary, you need to clearly distinguish between two forms that can’t be mixed freely:
Both forms can be transformed in a structure-agnostic way (e.g.,
mapworks on finite and infinite lists both). Codata can be consumed incrementally, driven by a corecursive algorithm; data can be generated recursively, limited by a recursive algorithm.What you can’t do is consume codata recursively (e.g., left folding an infinite list) or generate data corecursively (rare in Haskell, due to laziness).
[0]: Actually, it will fail on some inputs (e.g., some disconnected graphs), but that’s a different issue.