I receive an iterator as argument and I would like to iterate on values twice.
public void reduce(Pair<String,String> key, Iterator<IntWritable> values,
Context context)
Is it possible ? How ?
The signature is imposed by the framework I am using (namely Hadoop).
— edit —
Finally the real signature of the reduce method is with an iterable. I was misled by this wiki page (which is actually the only non-deprecated (but wrong) example of wordcount I found).
We have to cache the values from the iterator if you want to iterate again. At least we can combine the first iteration and the caching:
(just to add an answer with code, knowing that you mentioned this solution in your own comment 😉 )
why it’s impossible without caching: an
Iteratoris something that implements an interface and there is not a single requirement, that theIteratorobject actually stores values. Do iterate twice you either have to reset the iterator (not possible) or clone it (again: not possible).To give an example for an iterator where cloning/resetting wouldn’t make any sense: