As a C# programmer I have a sketchy understanding of Java / Scala iterator design.
I am trying to (lazily – for the source may be big) read records from a RecordReader (in some third party library). I need to do some additional work every 100 records.
for (group <- reader.iterator.zipWithIndex.grouped(100)) {
for ((record, i) <- group) {
println(i + "|" + record.key)
}
// ...
}
This gives me the very last record, repeatedly, each time.
If I don’t use grouped, it works fine and I get each record. Am I missing something about lazy streaming or Java iterators?
To troubleshoot, try to decorate your iterator in an another iterator that prints what is going on:
Call wrap on the iterator, the very first time you instantiate the iterator. This will print something like:
This should help you determine if the iterator is ill behaved… It could be for instance that the library does not deal correctly with calling
hasNextmultiple times without callingnext. In that case you can modifywrapso that you make the iterator behave correctly. One more thing, from the symptoms, it feels like you’ve already consume the iterator before the grouped is called. So be extra careful and check if you’ve used the same iterator reference before.