I have a Windows service written in C# that spawns several worker threads. Those threads are supposed to loop every X minutes until the service is stopped, which works very well in most cases. But, there is one thread that appears to be stopping for no reason. We already have a try/catch block with logging code around the thread’s entire function, but it never logs any exceptions.
In .NET, is there any way to monitor a thread from another process and record when/why/how it stopped?
More details
The code that spawns the thread looks like this:
try
{
// Create a new thread for processing Incoming Emails
IncomingEmailThread = new Thread(new ThreadStart(ProcessIncomingEmails));
IncomingEmailThread.Start();
LogEvent("Service Started", EventLogEntryType.Information);
}
catch (Exception e)
{
LogEvent(e.Message, EventLogEntryType.Error);
}
And the code inside the thread looks like this:
while (!Closing)
{
try
{
// Wait for 5 minutes before running.
InterruptableSleep.WaitOne(300000, false);
// Process the incoming email for all instances
string[] Instances = Settings.GetAllInstances();
foreach (string Instance in Instances)
{
Logic.IncomingEmail IncomingEmailInstance = new Logic.IncomingEmail(Instance);
IncomingEmailInstance.CreateRecordsFromIncomingEmail();
}
}
catch (Exception ex)
{
// Log the exception and then eat it so it doesn't stop the thread
LogEvent(ex.Message + "\r\n" + ex.StackTrace, EventLogEntryType.Error);
}
}
The problem is not caused by the Closing flag, because this loop usually runs for several days before it stops working. The problem is not an exception inside CreateRecordsFromIncomingEmail(), because the catch block has not logged any exceptions. Our logging code writes directly to the Windows event log, we use it throughout the product, and it is very reliable.
Unfortunately, we can’t use a debugger, because we’ve only seen the problem on one production server. We haven’t been able to reproduce it in dev, or on any other servers.
We never did find a solution, but the problem stopped happening. We decided to just add some logging code in case it ever happens again.