I have several “data sources”, each of which provides ordered timestamped data. I’d like to flatten it into a single ordered stream (like merge sort). This answer describes how to do it for two enumerables, but I am not sure how to generalize it.
Data sources are huge, so I cannot do it in memory, it has to be streamed.
To explain it with an example, I have something like this:
interface IDataSource
{
IEnumerable<DateTime> GetOrderedRecords();
}
I would like to be able to have an extension method like this:
// get all sources
IEnumerable<IDataSource> dataSources = GetAllSources();
// merge sort
IEnumerable<DateTime> flattened = dataSources
.MergeSort(s => s.GetOrderedRecords());
[Edit]
The reason I can’t load everything eagerly and then sort it is because I am loading data from multiple databases and exporting it into a different one. Each IDataSource is basically Linq-to-NHibernate under the hood, and I have millions of data rows to return.
So what I need is something like:
- From all available sources, load the next timestamp.
- Store it to disk and “forget it”.
Data sources are already sorted, which makes the “merge sort” approach feasible.
One simple thing you could do is to concat the calls to the
Mergeimplementation from the question you linked:You would call it like this:
This has the drawback that each call to
MoveNexton the enumerator of the returned enumerable yields quite a lot of MoveNext calls on the nested enumerables.