I am writing a very thread intensive application that hangs when it exits.
I’ve traced into the system units and found the place where the program enters an infinite loop. It’s in SysUtils line 19868 -> DoneMonitorSupport -> CleanEventList:
repeat until InterlockedCompareExchange(EventCache[I].Lock, 1, 0) = 0;
I’ve searched for a solution online and found a couple of QC reports:
Unfortunately, these don’t seem to relate to my situation as I don’t use either TThreadList or TMonitor.
I’m pretty sure that all my threads have finished and have been destroyed as that all inherit from a base thread that keeps a create/destroy count.
Has anybody come across similar behaviour before? Do you know of any strategies for discovering where the root cause may lie?
I’ve been looking at how the
TMonitorlocks are implemented, and I finally made an interesting discovery. For a bit of drama, I’ll first tell you how the locks work.When you call any
TMonitorfunction on anTObject, a new instance of theTMonitorrecord is created and that instance is assigned to aMonitorFldinside the object itself. This assignment is made in a thread-safe way, usingInterlockedCompareExchangePointer. Because of this trick theTObjectonly contains one pointer-size amount of data for the support ofTMonitor, it doesn’t contain the full TMonitor structure. And that’s a good thing.This
TMonitorstructure contains a number of records. We’ll start with theFLockCount: Integerfield. When the first thread usesTMonitor.Enter()on any object, this combined lock-counter field will have the value ZERO. Again using aInterlockedCompareExchangemethod the lock is acquired and the counter is initiated. There will be no locking for the calling thread, no context-switch since this is all done in-process.When the second thread tries to
TMonitor.Enter()the same object, it’s first attempt to lock will fail. When that happens Delphi follows two strategies:TMonitor.SetSpinCount()to set a number of "spins", then Delphi will do a busy-wait loop, spinning the given number of times. That’s very nice for tiny locks because it allows acquiring the lock without doing a context-switch.TMonitor.Enter()will initiate a Wait on the event returned byTMonitor.GetEvent(). In other words it will not busy-wait wasting CPU cycles. Remember theTMonitor.GetEvent()because that’s very important.Let’s say we’ve got a thread that acquired the lock and a thread that tried to acquire the lock but is now waiting on the event returned by
TMonitor.GetEvent. When the first thread callsTMonitor.Exit()it will notice (via theFLockCountfield) that there is at least one other thread blocking. So it immediately pulses what should normally be the previously allocated event (callsTMonitor.GetEvent()). But since the two threads, the one that callsTMonitor.Exit()and the one that calledTMonitor.Enter()might actually callTMonitor.GetEvent()at the same time, tehre are a couple more tricks insideTMonitor.GetEvent()to make sure that only one event is allocated, irrelevant of the order of operations.For a few more fun moments we’ll now delve into the way the
TMonitor.GetEvent()works. This thing lives inside theSystemunit (you know, the one we can’t recompile to play with), but it turns out it delegates the duty of actually allocated the Event to an other unit, through theSystem.MonitorSupportpointer. That points to a record of typeTMonitorSupportthat declares 5 function pointers:NewSyncObject– allocates a new Event for Synchronization purposesFreeSyncObject– deallocates the Event allocated for Synchronization purposesNewWaitObject– allocates a new Event for Wait operationsFreeWaitObject– deallocates that Wait eventWaitAndOrSignalObject– well.. waits or signals.It also turns out that the objects returned by the
NewXYZfunctions could be anything, because they’re only used for the call toWaitXYZand for the corresponding call toFreeXyzObject. The way those functions are implemented inSysUtilsis designed to provide those locks with a minimum amount of locking and context-switching; Because of that the objects themselves (returned byNewSyncObjectandNewWaitObject) are not directly the Events returned byCreateEvent(), but pointers to records in theSyncEventCacheArray. It goes even further, actual Windows Events are not created until required. Because of that the records in theSyncEventCacheArraycontains a couple of records:TSyncEventItem.Lock– this tells Delphi rather the Lock is being used for anything right now or not andTSyncEventItem.Event– this holds the actual Event that’ll be used for synchronization, if waiting is required.When the application terminates, the
SysUtils.DoneMonitorSupportgoes over all the records in theSyncEventCacheArrayand waits for the Lock to become ZERO, ie, waits for the lock to stop being used by anything. Theoretically, as long as that lock is NOT Zero, at least one thread out there might be using the lock – so the sane thing to do would be to wait, in order to NOT cause AccessViolations errors. And we finally got to our current question: HANGING inSysUtils.DoneMonitorSupportWhy an application might Hang in SysUtils.DoneMonitorSupport even if all it’s threads terminated properly?
Because at least one Event allocated using any one of
NewSyncObjectorNewWaitObjectwas not freed using it’s correspondingFreeSyncObjectorFreeWaitObject. And we go back to theTMonitor.GetEvent()routine. The Event it allocates is saved in theTMonitorrecord that corresponds to the object that was used forTMonitor.Enter(). The pointer to that record is only kept in that object’s instance data, and is kept there for the life of the application. Searching for the name of the field,FLockEvent, we find this in theSystem.pasfile:and a call to that record-destructor in here:
procedure TObject.CleanupInstance.In other words, the final sync-event is only released when the object that was used for synchronization is freed!
Answer to OP’s question:
The application hangs because at least one OBJECT that was used for
TMonitor.Enter()was not freed.Possible solutions:
Unfortunately I don’t like this. It’s not right, I mean the penalty for not freeing a small object should be a small memory leak, not a hanging application! This is especially bad for Service applications where a service might simply hang for ever, not fully shut down but unable to respond to any request.
The solutions for the Delphi team? They should NOT hang in the finalization code of the
SysUtilsunit, no-matter-what. They should probably ignore theLockand move to closing the Event handle. At that stage (finalization of the SysUtils unit), if there’s still code running in some thread, it’s in a real bad shape as most of the units got finalized, it’s not running in the environment it was designed to run in.For the delphi users? We can replace the
MonitorSupportwith our own version, one that doesn’t do those extensive tests at finalization time.