We have a bug in our application that does not occur every time and therefore we don’t know its “logic”. I don’t even get it reproduced in 100 times today.
Disclaimer: This bug exists and I’ve seen it. It’s not a pebkac or something similar.
What are common hints to reproduce this kind of bug?
Analyze the problem in a pair and pair-read the code. Make notes of the problems you KNOW to be true and try to assert which logical preconditions must hold true for this happen. Follow the evidence like a CSI.
Most people instinctively say “add more logging”, and this may be a solution. But for a lot of problems this just makes things worse, since logging can change timing-dependencies sufficiently to make the problem more or less frequent. Changing the frequency from 1 in 1000 to 1 in 1,000,000 will not bring you closer to the true source of the problem.
So if your logical reasoning does not solve the problem, it’ll probably give you a few specifics you could investigate with logging or assertions in your code.