I’m currently investigating a bug that causes a windows to freeze up. After the bug happens all process that are currently running will continue to run, but if you try to use them they will eventually freeze up.
For example I have a task manager and a couple of cmds open at the moment of freeze up. Task manager works nicely, displays processor/memory usage, list of all processes etc. But if I try to kill a process it would freeze up. If i tried to open File -> New Task it would freeze up. In cmd if i tried to open a windows application, the command would execute and the new process would appear in task manager but the application would not start up. Even starting a command line application would freeze up.
The software in question is a set of 12 various service applications that communicate with each other using WCF. Most is written in C#, there is some Fortran, C++. All of this is running user space, we have nothing executing in kernel space.
So my question is has anyone seen this or similar behavior? What were the causes? In theory nothing a user space application does should freeze the whole OS?. Any tips on debugging this situation would also be helpful. Thank you for your time.
Update 1:
We’ve tried writing a small application that constantly writes/reads (with random seeks and opening/closing of file) from disk and started before the system freezes. The application kept on successfully writing/reading opening and closing files after the freeze. The memory usage is same as in normal use, between 4 and 5 GB the system has 6GB.
We also did a memory dump the trouble is that we failed to figure out what is happening. The dump of course shows that windows has frozen in keyboard driver, but besides that we couldn’t figure much out. It would be much more useful if we could do user space memory dump. Ok this sentence made me Google a bit, it appears there is a complete memory dump option, will research this some more and update on progress.
Our current suspect is NOD32 Firewall, when it’s off everything appears to be working ok. We still need to confirm this and find out what in our code is provoking this behavior.
Thanks everybody for your assistance.
Update 2:
Ok I’ve managed to create full memory dump. It wasn’t as easy as I hoped, here are some useful resources maybe they will help someone someday.. :
http://www.osronline.com/article.cfm?article=545
Once system froze, I started one cmd.exe and initiated copy command, the cmd froze, and here is it’s stack trace:
fffff880`087571d0 fffff800`02cc2992 nt!KiSwapContext+0x7a
fffff880`08757310 fffff800`02cc4d0f nt!KiCommitThreadWait+0x1d2
fffff880`087573a0 fffff800`02cd9d1f nt!KeWaitForSingleObject+0x19f
fffff880`08757440 fffff800`02fc06d6 nt!AlpcpSignalAndWait+0x8f
fffff880`087574f0 fffff800`02fbe660 nt!AlpcpReceiveSynchronousReply+0x46
fffff880`08757550 fffff800`02fcd13d nt!AlpcpProcessSynchronousRequest+0x33d
fffff880`08757670 fffff800`030ade59 nt!LpcpRequestWaitReplyPort+0x9c
fffff880`087576d0 fffff880`05ad1344 nt!LpcRequestWaitReplyPort+0x19
fffff880`08757710 fffff880`05ad430f eamon+0x5344
fffff880`087578d0 fffff880`05ad25bb eamon+0x830f
fffff880`08757970 fffff800`02fd075f eamon+0x65bb
fffff880`087579f0 fffff800`02fb6624 nt!IopCloseFile+0x11f
fffff880`08757a80 fffff800`02fd0251 nt!ObpDecrementHandleCount+0xb4
fffff880`08757b00 fffff800`02fd0164 nt!ObpCloseHandleTableEntry+0xb1
fffff880`08757b90 fffff800`02cba953 nt!ObpCloseHandle+0x94
fffff880`08757be0 00000000`77bff7aa nt!KiSystemServiceCopyEnd+0x13 (TrapFrame @ fffff880`08757be0)
00000000`002fd848 00000000`00000000 ntdll!ZwClose+0xa
Update 3:
After some extensive testing we have concluded that issue is related to ESET NOD32 Antivirus. Thank you all for your help and information provided.
From the stack dump, the “eamon.sys” driver seems to be in the middle of the battle. Like you said, this driver is related to ESET’s NOD32 Antivirus.
If you add to this the fact you say everything is working fine without it, then you should stop your research here. Antivirus software packages are by definition installed as drivers, so they can do their work efficiently. The downside of this is when they have problems, it means they can easily hog a machine completely or cause BSODs.
Googling a bit, there are some others similar reports about this particular software (http://www.wilderssecurity.com/archive/index.php/t-259245.html).
You should contact the vendor and see if it’s normal or if they have an update or a way to fix this.