I have a PostgreSQL 9.1.3 streaming replication setup on Ubuntu 10.04.2 LTS (primary and

Question

0

Asked: June 3, 20262026-06-03T16:17:57+00:00 2026-06-03T16:17:57+00:00

I have a PostgreSQL 9.1.3 streaming replication setup on Ubuntu 10.04.2 LTS (primary and

0

I have a PostgreSQL 9.1.3 streaming replication setup on Ubuntu 10.04.2 LTS (primary and standby). Replication is initialized with a streamed base backup (pg_basebackup). The restore_command script tries to fetch the required WAL archives from a remote archive location with rsync.

Everything works like described in the documentation when the restore_command script fails with an exit code <> 255:

At startup, the standby begins by restoring all WAL available in the archive location, calling restore_command. Once it reaches the end of WAL available there and restore_command fails, it tries to restore any WAL available in the pg_xlog directory. If that fails, and streaming replication has been configured, the standby tries to connect to the primary server and start streaming WAL from the last valid record found in archive or pg_xlog. If that fails or streaming replication is not configured, or if the connection is later disconnected, the standby goes back to step 1 and tries to restore the file from the archive again. This loop of retries from the archive, pg_xlog, and via streaming replication goes on until the server is stopped or failover is triggered by a trigger file.

But when the restore_command script fails with exit code 255 (because the exit code from a failed rsync call is returned by the script) the server process dies with the following error:

2012-05-09 23:21:30 CEST - @  LOG:  database system was interrupted; last known up at     2012-05-09 23:21:25 CEST
2012-05-09 23:21:30 CEST - @  LOG:  entering standby mode
rsync: connection unexpectedly closed (0 bytes received so far) [Receiver]
rsync error: unexplained error (code 255) at io.c(601) [Receiver=3.0.7]
2012-05-09 23:21:30 CEST - @  FATAL:  could not restore file "00000001000000000000003D" from archive: return code 65280
2012-05-09 23:21:30 CEST - @  LOG:  startup process (PID 8184) exited with exit code 1
2012-05-09 23:21:30 CEST - @  LOG:  aborting startup due to startup process failure

So my question is now: Is this a bug or is there a special meaning of exit code 255 which is missing in the otherwise excellent documentation or am I missing something else here?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-03T16:18:00+00:00

On the primary server, you have WAL files sitting in the pg_xlog/ directory. While WAL files are there, PostgreSQL is able to deliver them to the standby should they be requested.

Typically, you also have local archived WAL location, when files are moved there by PostgreSQL, they no longer can be delivered to the standby on-line and standby is expecting them to come from the archived WAL location via restore_command.

If you have different locations for archived WALs setup on primary and on standby servers, then there’s no way for a while to reach standby and you have a gap.

In your case this might mean, that:

00000001000000000000003D had been archived by the primary PostgreSQL;
standby’s restore_command doesn’t see it from the configured source location.

You might consider manually copying missing WAL files from primary to the standby using scp or rsync. It is also might be necessary to review your WAL locations and make sure both servers look in the same direction.

EDIT:
grep-ing for restore_command in sources, only access/transam/xlog.c references it. In function RestoreArchivedFile almost at the end (round line 3115 for 9.1.3 sources), there’s a check whether restore_command had exited normally or had it received a signal.

In first case, message is classified as DEBUG2. In case restore_command received a signal other then SIGTERM (and wasn’t able to handle it properly I guess), a FATAL error will be reported. This is true for all codes greater then 125.

I will not be able to tell you why though.
I recommend asking on the hackers list.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have a PostgreSQL 9.1.3 streaming replication setup on Ubuntu 10.04.2 LTS (primary and

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply