I get the following errors in airbrake if my staging (2 servers) or production (4 servers) servers have no activity for about 15 minutes. Here are the error messages:
ActiveRecord::StatementInvalid: PG::Error: could not receive data from
server: Connection timed out
OR
PG::Error: could not connect to server: Connection timed out Is the
server running on host “tci-db4.dev.prod” and accepting TCP/IP
connections on port 5432?
I’m using PostgreSQL as my database. One of the servers also acts as the db server.
Environment:
Ruby 1.9.3 (This also happened under Ruby 1.8.7, but it is worse since upgrading since the ruby process on the server will go to 100% and stay at 100% until is killed when the server loses the db connection.
Rails 3.1.6
PG GEM 0.13.2
Postgres 9.1
Phusion Passenger
This problem has been happening for over a year, so I’m hoping someone has some insight on how to fix it. Thanks.
Check your TCP/IP socket timeout settings on all routers/switches between the application servers and the database servers. Also turn on logging on the database side and watch the full life cycle of the connection and compare the timing to the errors in your application. I suggest turning on the following settings in postgresql.conf until you get an idea of what to look for:
These can be activated with a SIGHUP of the postgres process (or run “SELECT pg_reload_conf();” as a database superuser.
I’ll be that you have a “connection closed by remote host” or something similar as the last message before the actual disconnect is logged.
I’ve seen this before and it was the timeout settings on an intermediate switch causing it.