I´m writing a C# (.NET 4.5) application that is used to aggregate time based events for reporting purposes. To make my query logic reusable for both realtime and historical data I make use of the Reactive Extensions (2.0) and their IScheduler infrastructure (HistoricalScheduler and friends).
For example, assume we create a list of events (sorted chronologically, but they may coincide!) whose only payload ist their timestamp and want to know their distribution across buffers of a fixed duration:
const int num = 100000;
const int dist = 10;
var events = new List<DateTimeOffset>();
var curr = DateTimeOffset.Now;
var gap = new Random();
var time = new HistoricalScheduler(curr);
for (int i = 0; i < num; i++)
{
events.Add(curr);
curr += TimeSpan.FromMilliseconds(gap.Next(dist));
}
var stream = Observable.Generate<int, DateTimeOffset>(
0,
s => s < events.Count,
s => s + 1,
s => events[s],
s => events[s],
time);
stream.Buffer(TimeSpan.FromMilliseconds(num), time)
.Subscribe(l => Console.WriteLine(time.Now + ": " + l.Count));
time.AdvanceBy(TimeSpan.FromMilliseconds(num * dist));
Running this code results in a System.StackOverflowException with the following stack trace (it´s the last 3 lines all the way down):
mscorlib.dll!System.Threading.Interlocked.Exchange<System.IDisposable>(ref System.IDisposable location1, System.IDisposable value) + 0x3d bytes
System.Reactive.Core.dll!System.Reactive.Disposables.SingleAssignmentDisposable.Dispose() + 0x37 bytes
System.Reactive.Core.dll!System.Reactive.Concurrency.ScheduledItem<System.DateTimeOffset>.Cancel() + 0x23 bytes
...
System.Reactive.Core.dll!System.Reactive.Disposables.AnonymousDisposable.Dispose() + 0x4d bytes
System.Reactive.Core.dll!System.Reactive.Disposables.SingleAssignmentDisposable.Dispose() + 0x4f bytes
System.Reactive.Core.dll!System.Reactive.Concurrency.ScheduledItem<System.DateTimeOffset>.Cancel() + 0x23 bytes
...
Ok, the problem seems to come from my use of Observable.Generate(), depending on the list size (num) and regardless of the choice of scheduler.
What am I doing wrong? Or more generally, what would be the preferred way to create an IObservable from an IEnumerable of events that provide their own timestamps?
(update – realized I didn’t provide an alternative: see at bottom of answer)
The problem is in how
Observable.Generateworks – it’s used to unfold a corecursive (think recursion turned inside out) generator based on the arguments; if those arguments end up generating a very nested corecursive generator, you’ll blow your stack.From this point on,
I’m speculating a lot (don’t have the Rx source in front of me)(see below), but I’m willing to bet your definition ends up expanding into something like:And on and on until your call stack gets big enough to overflow. At, say, a method signature + your int counter, that’d be something like 8-16 bytes per recursive call (more depending on how the state machine generator is implemented), so 60,000 sounds about right (1M / 16 ~ 62500 maximum depth)
EDIT: Pulled up the source – confirmed: the “Run” method of Generate looks like this – take note of the nested calls to
Generate:EDIT: Derp, didn’t offer any alternatives…here’s one that might work:
(EDIT: fixed
Enumerable.Range, so stream size won´t be multiplied bychunkSize)