I need to build a partial Inverted Index. Something like:
l = {{x, {h, a, b, c}}, {y, {c, d, e}}}
iI[l]
(*
-> {{a, {x}}, {b, {x}}, {c, {x, y}}, {d, {y}}, {e, {y}}, {h, {x}}}
*)
I think it is pretty clear what it does. In the input list, the {x, y …} are unique, while the {a, b, c, ..} are not. The output ought to be ordered by #[[1]].
Right now, I am doing this:
iI[list_List] := {#, list[[Position[list, #][[All, 1]]]][[All, 1]]} & /@
(Union@Flatten@Last@Transpose@list)
But it looks too convoluted for such an easy task, seems too slow, and I should be able to cope with Legion.
A test drive to compare your results:
words = DictionaryLookup[];
abWords = DictionaryLookup["ab" ~~ ___];
l = {#, RandomChoice[abWords, RandomInteger[{1, 30}]]} & /@ words[[1 ;; 3000]];
First@Timing@iI[l]
(*
-> 5.312
*)
So, any ideas for an speedup?
Seems a classic task for
Reap–Sow(improvement in the final version due to @Heike):Then,
and
EDIT
Here is an alternative version with a similar (slightly worse) performance:
It is interesting that
Reap–Sowhere gives an even slightly faster solution than the one based on structural operations.EDIT 2
Just for an illustration – for those who prefer rule-based solutions, here is one based on a combination of
DispatchandReplaceList:It is about 2-3 times slower than the other two, though.