I connect to a REST service with an issued certificate and authentication via HTTP basic auth. The connectivity process works fine and I submit requests (at the most a few per second) which works fine. However, every ten thousand request or so I get a javax.net.ssl.SSLHandshakeException with message Received fatal alert: handshake_failure. The response time for normal/successful requests is <100 ms but these exceptions are preceded with extreme delays of up to 30 seconds.
I know very little about this stuff but I would like to assume that the Exception is due to network problems/congestion or problems on the receiving end as opposed to a certification/authentication problem on my behalf as ~99.999% of my requests are successful? Is that assumption reasonable or can someone shed some light on what might be causing this to happen? Thanks.
Are you sure your network isn’t having an issue with Spanning Tree Protocol recalculation? From the high-level description, STP issues can create the same “network out, but wait it’s back again” periodic failures.
If it is a STP issue, turning STP off is the wrong solution. You need to identify what is mishandling the STP traffic, and either reconfigure that device to “turn on” STP or replace it (if it actually mis-handles STP).
If this stab in the dark doesn’t get you anywere, a real analysis of the network is called for. Read the router / switch logs (assuming you have devices that make those available), and perhaps build a Wireshark diagnostic computer with two NIC cards so you can “attach” a listening device on particular wire links. Wireshark works on wireless too, and as this is a network issue, you might find the answer on a different network than you expected.
I had a hell of a time with two devices that set up an impromptu wireless bridge (creating a routing circle) when one of my switches (and old linksys) didn’t properly handle STP. I would have had a much harder time figuring it out without the help of Wireshark (and a lot of reading up on the protocols I was seeing to determine if they were “working correctly”).