I am developing a MacOS X application which runs a large number of background jobs, using GCD. The background jobs use CLucene to index documents, and access Core Data on a child context.
These jobs are all spawned in short order (using dispatch_async on a queue created with DISPATCH_QUEUE_CONCURRENT), but only 4 do actual work at once. This is accomplished using a dispatch_semaphore_t, by waiting on it when a job begins and releasing it when the job completes.
I’m seeing a very strange, reliably reproducible deadlock when:
- Background jobs are still running
- The user switches focus to another application, and then back
NSWindow is apparently deadlocking trying to send a notification while displaying the menu bar. This is the stack trace of the main thread when this happens:
#0 0x00007fff870ae6c2 in semaphore_wait_trap ()
#1 0x00007fff8b1bf486 in _dispatch_semaphore_wait_slow ()
#2 0x00007fff8b69c12b in -[_NSDNXPCConnection sendMessage:waitForAck:] ()
#3 0x00007fff8b57ced5 in _CFXNotificationPost ()
#4 0x00007fff8b58bbf3 in CFNotificationCenterPostNotification ()
#5 0x00007fff902ae174 in HIS_XPC_CFNotificationCenterPostNotification ()
#6 0x00007fff8bd3612a in BroadcastToolboxMessage ()
#7 0x00007fff8bd6d063 in MenuBarInstance::Show(MenuBarAnimationStyle, unsigned char, unsigned char, unsigned char) ()
#8 0x00007fff8bd98144 in SetMenuBarObscured ()
#9 0x00007fff8bd97e0f in HIApplication::HandleActivated(OpaqueEventRef*, unsigned char, OpaqueWindowPtr*) ()
#10 0x00007fff8bd95407 in HIApplication::EventObserver(unsigned int, OpaqueEventRef*, void*) ()
#11 0x00007fff8bd636e0 in _NotifyEventLoopObservers ()
#12 0x00007fff898dc018 in -[NSWindow sendEvent:] ()
#13 0x00007fff898d8744 in -[NSApplication sendEvent:] ()
#14 0x00007fff897ee2fa in -[NSApplication run] ()
#15 0x00007fff89792cb6 in NSApplicationMain ()
#16 0x0000000100001e52 in main at /Users/mspong/dev/Indx/Indx/Indx/main.m:13
#17 0x00007fff86b7b7e1 in start ()
All running background jobs finish their work, but no further jobs get access to the aforementioned semaphore. Every thread is stuck on semaphore_wait_trap.
I can’t imagine what I could possibly be doing to (apparently) cause unrelated semaphores (both mine and Apple’s) to get stuck. Can anybody offer some advice on how to investigate this further?
Is it possible that you’re hitting the GCD concurrent queue thread limit (64 threads), and then doing something that tries to do work on a concurrent queue? That would cause random deadlocks across the entire framework.
If that’s the case, my only recommendation is: never block in a concurrent queue.