I have this query that is bothering me; it is encapsulated as a new query operator, I made two versions of it, trying to see which one performs better. Both perform horribly.
First attempt; declarative style
public static IEnumerable<IEnumerable<α>> Section<α>(this IEnumerable<α> source, int length)
{
return source.Any()
? source.Take(length).Cons(source.Skip(length).Section(length))
: Enumerable.Empty<IEnumerable<α>>();
}
Second attempt: imperative “yield return” style
public static IEnumerable<IEnumerable<α>> Section<α>(this IEnumerable<α> source, int length)
{
var fst = source.Take(length);
var rst = source.Skip(length);
yield return fst;
if (rst.Any())
foreach (var section in rst.Section(length))
yield return section;
}
In fact the second attempt is worse, both in terms of readability, compositionality and in terms of speed.
Any clues as to how to optimize this?
I suspect that the problem you’re having is related to the fact that enumerating the final result is at least an O(n^2) operation, possibly worse; I haven’t worked it all out in my head yet.
Why is that? Well, suppose you have [1, 2, 3, 4, 5, 6] and you split that up into what you think is { { 1, 2 }, {3, 4}, {5, 6} }
That’s not what you’ve done. You’ve in fact split this up into { take the first two, take the first two and discard them and then take the next two, take the first two and discard then and then take the next two and discard them and then take the third two }
Notice how each step along the way re-calculate the result? That’s because the array could be changing between calls to the enumeration. LINQ was designed to always get you up-to-date results; you write a query that means “skip the first four and iterate the next two”, that’s exactly what you get — a query that executes that code when you enumerate it.
Is the original sequence small enough and fast enough that you can read the whole thing into memory and split it all up at once, rather than trying to do so lazily? Alternatively, is the sequence indexible? If all you get is forward access to the sequence and it is too big or slow to read into memory all at once then there is not a whole lot you can do here. But if you have one or both of those properties then you can make this at least linear.