I’m trying to improve the performance of a (virtual) web server with a fairly standard CentOS/Apache setup and one thing I noticed is that new connections seem to “stick” in the SYN_RECV state, sometimes for several seconds, before finally being established and handled by Apache.
My first guess was that Apache could be reaching the limit for the number of connections it’s prepared to handle simultaneously, but e.g. with keep-alive off netstat is reporting a few established connections (just those not involving localhost, so discarding “housekeeping” connections e.g. between Apache and Tomcat), whereas with keep-alive on it will happily get up to 100+ established connections (but with no clear difference to the SYN_RECV behaviour either way — there’s typically 10-20 connections sitting in SYN_RECV at any one time).
What are people’s recommendations for investigating where the bottleneck is that’s preventing the connections from being established quickly?
P.S. Follow-on question: does anybody know what a TYPICAL statistic would be for the time for a connection to be established once first “hitting” the server?
Update in case anyone else encounters this: in the end, I wrote a small Java program to take data from /proc/net/tcp and analyse and it appears that this is happening for a small proportion of connections (although that still means that at any one time there can be a number of connections in this state, because they can stay this way for a number of seconds) and looks like an issue local to those connections. Over 90% of connections are still going through in < 500ms and 81% in < 200ms. So if others get this, there isn’t necessarily need for panic immediately.
Try capturing a packet trace and see if SYN ACKs are being retransmitted (and the number of re-tx). This could indicate a routing issue (SYN comes in via path A and SYN-ACK goes via path B which is broken).
Also see if these connections have a specific pattern (such as originating from the same network).