From what I understand carriers remove any tcp connections that are idle for some number of minutes. This is why if you want to maintain a persistent tcp connection from your clients to your backend, you have to send keep-alive’s both ways.
My question is: what should that keep-alive interval be?
To be clear, nobody can remove a TCP connection except the end-points. That’s because the IP network protocol is all that the network sees and it is stateless by design.
What makes an “end point” however may not be what you expect. A carrier can put a transparent proxy or a router with NAT in between at which point those do need to keep state in order to properly forward the data.
NAT is your biggest problem because it’s more common and if the router decides to drop the state for a connection due to not seeing traffic for a while, and endpoint will never know until the next time it tries to send data.
Enabling
SO_KEEPALIVEhas a default value of 2 hours. A well behaved router should thus hold state for at least that long, but don’t bet the farm on it.To answer your specific question… If it were me, I’d use 15 minutes or less.
Note that only one side need send the keep-alive as they work by re-sending the last 1 byte of the data stream as though it got lost on the network. The receiver discards it because it has already seen it but sends a new ACK in reply thus resulting in traffic going both directions.