I really like the
for (line <- Source fromFile inputPath getLines) {doSomething line}
construction for iterating over a file in scala and am wondering if there is a way to use a similar construction for iterating over the lines in all the files in a directory.
An important restriction here is that all files add up to an amount of space that would generate a heap overflow. (think dozens of GB, so increasing heap size isn’t an option) As a work around for the time being, I have been cat’ing every together into one file and using the above construction which works b/c of laziness.
Point being, this seems to raise questions like.. can I concatenate two (hundred) lazy iterators and get a really big, really lazy one?
Yes, although it’s not quite so concise:
The trick is
flatMapand itsfor-comprehension syntactic sugar. The above, for example, is more or less equivalent to the following:As Daniel Sobral notes in a comment below, this approach (and the code in your question) will leave files open. If this is a one-off script or you’re just working in the REPL, this might not be a big deal. If you do run into problems, you can use the pimp-my-library pattern to implement some basic resource management:
Now just use
Source fromFile file getLinesAndCloseand you won’t have to worry about files being left open.