I write the following Haskell code which take a triplet (x,y,z) and a list of triplets [(Int,Int,Int)] and look if there is a triplet (a,b,c) in the list such that x == a and y == b if it is a case i just need to update c = c + z, if there is not a such of triplet in the list I just add the triplet in the list.
-- insertEdge :: (Int,Int,Int) -> [(Int, Int, Int)] -> [(Int, Int, Int)]
insertEdge (x,y,z) cs =
if (length [(a,b,c) | (a,b,c) <- cs, a /= x || b /= y]) == (length cs)
then ((x,y,z):cs))
else [if (a == x && b == y) then (a,b,c+1) else (a,b,c) | (a,b,c) <- cs]
After profiling my code it appears that this fuction take 65% of the execution time.
How can I re-write my code to be more efficient?
The first thing that jumps out at me is the conditional:
lengthexamines the entire list, so in the worst-case scenario (updating the last element) your function traverses the list three times: Once for the length of the filtered list, once for the length ofcs, and once to find the element to update.However, even getting rid of the extra traversals, the best you can do with the function as written will usually require a traversal of most of the list. From the name of the function and how much time was being spent in it, I’m guessing you’re calling this repeatedly to build up a data structure? If so, you should strongly consider using a more efficient representation.
For instance, a quick and easy improvement would be to use
Data.Map, the first two elements of the triplet in a 2-tuple as the key, and the third element as the value. That way you can avoid making so many linear-time lookups/redundant traversals.As a rule of thumb, lists in Haskell are only an appropriate data structure when all you do is either walk sequentially down the list a few times (ideally, just once) or add/remove from the head of the list (i.e., using it like a stack). If you’re searching, filtering, updating elements in the middle, or–worst of all–indexing by position, using lists will only end in tears.
Here’s a quick example, if that helps:
The
alterfunction is just insert/update/delete all rolled into one. This inserts the key into the map if it’s not there, and sums the values if the key does exist. To build up a structure incrementally, you can do something likefoldl incEdge M.empty edgeList. Testing this out, for a few thousand random edges your version with a list takes several seconds, whereas theData.Mapversion is pretty much immediate.