Using ehCache 2.4.4, I seem to have gotten into a deadlock on the ehCache Segment object. From other logging, I know that the ‘waiting thread’, 1694 last ran anything 9 hours before this stack trace was generated. In the meantime, 1696 has gone and done a lot of other work, so this lock is definitely being held errantly.
I’m pretty confident that I am not directly locking any Segment instances directly, so I assume this is some kind of issue internal to the library. Any ideas?
"Model Executor - 1696" Id=1696 in TIMED_WAITING on lock=java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@92eb1ed
at sun.misc.Unsafe.park(Native Method)
at java.util.concurrent.locks.LockSupport.parkNanos(Unknown Source)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(Unknown Source)
at java.util.concurrent.PriorityBlockingQueue.poll(Unknown Source)
at com.rtrms.application.modeling.local.BlockingTaskList.takeTask(BlockingTaskList.java:20)
at com.rtrms.application.modeling.local.ModelExecutor.executeNextTask(ModelExecutor.java:71)
at com.rtrms.application.modeling.local.ModelExecutor.run(ModelExecutor.java:46)
Locked synchronizers: count = 1
- java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync@4a3d767f
"Model Executor - 1694" Id=1694 in WAITING on lock=java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync@4a3d767f
owned by Model Executor - 1696 Id=1696
at sun.misc.Unsafe.park(Native Method)
at java.util.concurrent.locks.LockSupport.park(Unknown Source)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(Unknown Source)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(Unknown Source)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(Unknown Source)
at java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(Unknown Source)
at net.sf.ehcache.store.compound.Segment.unretrievedGet(Segment.java:248)
at net.sf.ehcache.store.compound.CompoundStore.unretrievedGet(CompoundStore.java:191)
at net.sf.ehcache.store.compound.impl.DiskPersistentStore.containsKeyInMemory(DiskPersistentStore.java:72)
at net.sf.ehcache.Cache.searchInStoreWithStats(Cache.java:1884)
at net.sf.ehcache.Cache.get(Cache.java:1549)
at com.rtrms.amoeba.cache.DistributedModeledSecurities.get(DistributedModeledSecurities.java:57)
at com.rtrms.amoeba.modeling.AssertPersistedModeledSecurities.get(AssertPersistedModeledSecurities.java:44)
at com.rtrms.application.modeling.tasks.ExpandableModelingTask.getNextUnexecutedTask(ExpandableModelingTask.java:35)
at com.rtrms.application.modeling.local.BlockingTaskList.takeTask(BlockingTaskList.java:36)
at com.rtrms.application.modeling.local.ModelExecutor.executeNextTask(ModelExecutor.java:71)
at com.rtrms.application.modeling.local.ModelExecutor.run(ModelExecutor.java:46)
Locked synchronizers: count = 0
Turns out that calls like Cache.acquireWriteLockOnKey end up obtaining a lock on the internal Segment, so this apparent deadlock was caused by a .unlock call that wasn’t in a finally block.
Editorial comment: It also implies that you can get contention trying to lock two different keys that just happened to be in the same Segment, which is pretty unfortunate.