I’ve been struggling to get a Prüfer sequence algorithm implemented in Clojure, primarily as a mental exercise. While what I have works correctly, I was wondering if there’s a more concise way of removing something from map and a collection of vectors, for example if I wanted to remove all 5’s in the following:
{1 [2 3], 2 [1 4], 3 [1 5], 4 [2], 5 [3]}
to be left with:
{1 [2 3], 2 [1 4], 3 [1], 4 [2]}
I suspect the structure I have is… suboptimal… but it was the simplest way I could think of to represent edge sets. Any thoughts much appreciated!
TLDR: Use sets instead of vectors in this case.
First, to repeat the representation you have chosen: An undirected graph is represented as a map from nodes (represented by numbers) to vectors of nodes. For each edge x–y there exists a map entry from x to a vector containing y and an entry from y to a vector containing x. To remove a node z and all its edges, this invariant has to be maintained by both removing the map entry that has z as its key and by removing z from the vectors of all other entries.
The first operation – to remove a map entry given its key – is simple. The map without the entry is given by the expression
(dissoc m 5), where m is the map in your example and 5 is the node you wish to remove from the graph.The second operation – to remove an element from all the vectors that are the values of a map – is a compound operation consisting of 1) do something for all values of a map 2) remove an element from a vector. Assuming 2) is solved (let’s call it
remove-from-vector), we can do 1) in a number of ways (here are three examples):Problem 2) raises some questions since there is no built-in Clojure function that removes an element in the middle of a vector. The reason is that it is not something that a vector can do efficiently. Imagine that you have a vector of 10000 elements and you want to remove the one with index 5000. To construct the new vector, 4999 elements has to be conj’ed to the end of the subvector containing indices 0 to 4999. Clearly, we would be better off with a datastructure that handles removal of an arbitrary element in a better way.
To address this issue, we can redesign the representation. There is in fact a datastructure that handles removal better: the set. If a set contains 10000 values with evenly distributed hash values, the height of its internal tree is 3. This means that to remove an element of the set requires 3 internal steps rather than something in the order of magnitude of 10000. If we use sets instead of vectors, the example map looks like this:
For sets the operation corresponding to our
remove-from-vectorabove isdisj– the reverse ofconj. The final solution with the new representation can be implemented like this: