I have a "simple" 4 class example that reliably shows unexpected behavior from java synchronization on multiple machines. As you can read below, given the contract of the java sychronized keyword, Broke Synchronization should never be printed from the class TestBuffer.
Here are the 4 classes that will reproduce the issue (at least for me). I’m not interested in how to fix this broken example, but rather why it breaks in the first place.
And here is the output I get when I run it:
java -cp . SyncTest
Before Adding
Creating a TestBuffer
Before Remove
Broke Synchronization
1365192
Broke Synchronization
1365193
Broke Synchronization
1365194
Broke Synchronization
1365195
Broke Synchronization
1365196
Done
UPDATE:
@Gray has the simplest example that breaks thus far. His example can be found here: Strange JRC Race Condition
Based on the feedback I’ve gotten from others, it looks like the issue may occur on Java 64-bit 1.6.0_20-1.6.0_31 (unsure about newer 1.6.0’s) on Windows and OSX. Nobody has been able to reproduce the issue on Java 7. It may also require a multi-core machine to reproduce the issue.
ORIGINAL QUESTION:
I have a class which provides the following methods:
- remove – Removes the given item from the list
- getBuffer – Iterates over all the items in the list
I’ve reduced the problem down to the 2 functions below, both of which are in the same object and they’re both synchronized. Unless I am mistaken, "Broke Synchronization" should never be printed because insideGetBuffer should always be set back to false before remove can be entered. However, in my application it is printing "Broke Synchronization" when I have 1 thread calling remove repeatedly while the other calls getBuffer repeatedly. The symptom is that I get a ConcurrentModificationException.
See Also:
Very strange race condition which looks like a JRE issue
Sun Bug Report:
This was confirmed as a bug in Java by Sun. It is apparently fixed (unknowingly?) in jdk7u4, but they have not backported the fix to jdk6.
Bug ID: 7176993
I think you are indeed looking at a JVM bug in the OSR. Using the simplified program from @Gray (slight modifications to print an error message) and some options to mess with/print JIT compilation, you can see what is going on with the JIT. And, you can use some options to control that to a degree that can suppress the issue, which lends a lot of evidence to this being a JVM bug.
Running as:
you can get an error condition (like others about 80% of the runs) and the compilation print somewhat like:
There are three OSR replacements (the ones with the % annotation on the compile ID). My guess is that it is the third one, which is the loop calling remove(), that is responsible for the error. This can be excluded from JIT via a .hotspot_compiler file located in the working directory with the following contents:
When you run the program again, you get this output:
and the problem does not appear (at least not in the repeated attempts that I’ve made).
Also, if you change the JVM options a bit, you can cause the problem to go away. Using either of the following I cannot get the problem to appear.
Interestingly, the compilation output for both of these still show the OSR for the remove loop. My guess (and it is a big one) is that delaying the JIT via the compilation threshold or changing the FreqInlineSize cause changes to the OSR processing in these cases that bypass a bug that you are otherwise hitting.
See here for info on the JVM options.
See here and here for information on the output of -XX:+PrintCompilation and how to mess with what the JIT does.