I have a vector of pairs representing hops between nodes which I would like to collapse when there are cycles (as in aggregate time between the hops in a cycle to display it as just one). So, for example, The path A –> B –> C –> D –> B –> C –> D –> E traverses the subpath B–> C –> D twice, so in my structure I would have something like:
(A,B,1)(B,C,3)(C,D,2)(D,B,4)(B,C,5)(C,D,8)(D,E,6)
which I would ideally reduce to:
(A,B,1)(B,C,3+5)(C,D,2+8)(D,E,6)
storing also the 4 from D to B (loop-back edge time) to aggregate separately and be able to display B –> C –> D in a condensed way (the aggregated edge times and an aggregated loop-back time for all the D–>B instances alongside a count of how many times we looped)
How can I go about this?
I would go for a suffix array or suffix tree.
Just produce tokens (in this case (from, to) ) and put it in the Suffix Array or Suffix Tree. Then you can get the pairs.
Simple but not efficietn way:
produce tokens, produce all suffixes of this tokenstream. sort them. and then you have all common subsequences in that sorted list near each other (tokenstream length – tokenstream suffix length = position)
You could do the sorting with quicksort for example, or you just looking for a suffix array implemantation. Suffix Arrays could be constructed in O(n) and with O(n) space. And finding maximal pairs/repeats (thats what you want) can be done in O(n+k) (where n = tokennumber, and k = cycles in your list)
Perhaps this helps. Then you could produce a tokenstream like: ABCDBCDE
Quick and dirty solution is here