I’m writing a program as follows:
- Find all files with the correct extension in a given directory
- Foreach, find all occurrences of a given string in those files
- Print each line
I’d like to write this in a functional way, as a series of generator functions (things that call yield return and only return one item at a time lazily-loaded), so my code would read like this:
IEnumerable<string> allFiles = GetAllFiles(); IEnumerable<string> matchingFiles = GetMatches( '*.txt', allFiles ); IEnumerable<string> contents = GetFileContents( matchingFiles ); IEnumerable<string> matchingLines = GetMatchingLines( contents ); foreach( var lineText in matchingLines ) Console.WriteLine( 'Found: ' + lineText );
This is all fine, but what I’d also like to do is print some statistics at the end. Something like this:
Found 233 matches in 150 matching files. Scanned 3,297 total files in 5.72s
The problem is, writing the code in a ‘pure functional’ style like above, each item is lazily loaded.
You only know how many files match in total until the final foreach loop completes, and because only one item is ever yielded at a time, the code doesn’t have any place to keep track of how many things it’s found previously. If you invoke LINQ’s matchingLines.Count() method, it will re-enumerate the collection!
I can think of many ways to solve this problem, but all of them seem to be somewhat ugly. It strikes me as something that people are bound to have done before, and I’m sure there’ll be a nice design pattern which shows a best practice way of doing this.
Any ideas? Cheers
In a similar vein to other answers, but taking a slightly more generic approach …
… why not create a Decorator class that can wrap an existing IEnumerable implementation and calculate the statistic as it passes other items through.
Here’s a
Counterclass I just threw together – but you could create variations for other kinds of aggregation too.You could create three instances of
Counter:GetAllFiles()counting the total number of files;GetMatches()counting the number of matching files; andGetMatchingLines()counting the number of matching lines.The key with this approach is that you’re not layering multiple responsibilities onto your existing classes/methods – the
GetMatchingLines()method only handles the matching, you’re not asking it to track stats as well.Clarification in response to a comment by
Mitcham:The final code would look something like this:
Note that this is still a functional approach – the variables used are immutable (more like bindings than variables), and the overall function has no side-effects.