Low Sodium: What to do when your timeline isn't right

Wednesday, March 21, 2012

What to do when your timeline isn't right

A lot of us are experienced with a single primary and single secondary replication setup in Postgres. Failover is straightforward. Your primary goes away, and the secondary is promoted to a primary. Then you create a new secondary in some manner. Maybe there is some floating IP magic in there to keep the clients happy in a transparent fashion. No problem.

The truth is though, since hot standby became available in 9.0, a lot of us are using more complicated setups these days. We'll have a primary, and then a failover secondary. Then maybe a secondary that is used for generating reports. Maybe one that we make backups on. Maybe even several others that we use for load balancing read-only queries. What happens in a primary failure situation now?

If you think about this for a little bit, the outlook seems dismal. Your failover machine switches timelines and now all your secondaries have a timeline mismatch with the new primary. Time to refresh all those databases from the primary? That doesn't sound fun. Especially if you have a few that are load balancing read-only queries in production. Ugh.

Good news, everyone, you can migrate your secondaries to a new primary and timeline!

First setup your primary and secondary for failover. Use the hybrid method (Streaming + WAL shipping) for all replication. Have a floating IP ready to be moved over to the failover database. Have the failover configured so that it is ready to be a primary. This means have your streaming options, like wal_senders, already set up. Have a replication entry in the pg_hba.conf. Have an archive_command already in place to start archiving WAL segments out to all the other secondaries. The only big difference between the primary and failover secondary config's should be the archive_command not sending WAL segments to itself. Make sure you have a trigger_file specified in the recovery.conf as well.

So far this is probably similar to what you already have. The key here is to use the WAL shipping in addition to streaming (hybrid method). This makes sure that all the other secondaries will get the new history file. It's also important that your archive_mode is 'on' and your archive_command will work as soon as failover is triggered. The first two files it will archive are critical to the process. I use OmniPITR for all the WAL shipping as well as the WAL restore and cleanup on the secondary side.

Next you need to have all your other secondaries point to the floating IP. This is so that at failover time they will be able to seamlessly connect to the new primary. They will also need an additional line in their recovery.conf:

recovery_target_timeline = 'latest'

This tells them to follow whatever timeline change they see. So now when your newly promoted primary pushes them a history file via WAL shipping, they will honor it and switch timelines too. This feature isn't documented in 9.0 (documentation bug?) but it still has an effect.

Now you should be ready to test your new setup. You do test, right? Good.

Unplug your primary.

Touch the trigger_file on the failover secondary.

Bring up the floating IP on your failover secondary.

You should now have a newly promoted primary in a new timeline. This is the part where the differences in 9.0 and 9.1 come out. In 9.0 the streaming protocol is one way, primary to secondary. In my testing the connections in 9.0 hung indefinitely. I even adjusted tcp_keepalive settings. Nothing seemed to help short of restarting the database. Ultimately, this is still easier then refreshing from the new primary, so I figure it still counts. In 9.1 there are some new feedback options. The option wal_receiver_status_interval is particularly useful. This option is enabled by default with a value of 10 seconds. It is meant to update the primary about the status of the secondary with regards to replication. In our scenario it lets the secondary know that the primary has disappeared. This causes it to try running the restore_command and finds the history file and then changes timelines. After which it connects back to the floating IP which now points to the new primary.

In this post I have been a little light on the technical details. I have made some assumptions about your level of knowledge. If you want some more in depth information, here are some links to the documentation:

10 comments:

grayhempMarch 22, 2012 at 3:02 AM
Thank you for the article, it is very informative.

BTW, setting up the hybrid method (streaming + archive_command) is not necessary to solve this task. If one have a pure streaming replication then to switch a replica to a new master he/her just needs to add

recovery_target_timeline = 'latest'

in the recovery.conf on the replica. Then stop the service on the replica, rsync pg_xlog to the replica and start the service.

Probably we can even replace the last 3 actions with just "rsync pg_xlog to the replica and restart the service" but I have not tested it.
ReplyDelete
Replies
ioguixMarch 22, 2012 at 8:22 AM
Yeah, this method is helpful, but there's a warning: the slave you promote MUST be the one that was the most up-to-date with the master before it failed.

If you promote the wrong one, slaves that had a smaller lag with the master might be corrupted, but will not complain.

Checking this is pretty sure with 9.1, but IIRC there was some issue with 9.0 where pg_current_xlog_location was able to step back in some situations...but my memory might be wrong.

About the tcp_keepalive, how/where did you set it up ? Did you try to set it up on the system side ?
ReplyDelete
Replies
ioguixMarch 28, 2012 at 8:05 AM
Might be both loss of updates and corruption.

Worst case: you promote the most lagged slave. It starts receiving writes and stream them to other slaves that had some more up-to-date datas.

If I understand it correctly, either thoses datas will be erased by new writes, or worst, some will stay visible. Think about clog as well, a txid which commit some data in t1 might have commited some other datas in t2 in previous the timeline... Anyway, it just look like a big and dangerous mess.
ReplyDelete
Replies
LucasJanuary 9, 2016 at 5:52 AM
Hi there! I've got the same problem... Can you guys help me to solve it, please?

You can check the details here: http://dba.stackexchange.com/questions/125709/recovery-from-live-to-a-new-slave-server-postgresql-error

Master1(A) ---> SLAVE1(B) ----> SLAVE2(C, NEW SLAVE)

SLAVE1(B) - Replication from MASTER1(A)
SLAVE2(C) - Replication from SLAVE1(B)

I got the following error:
2016-01-09 01:13:39.183 UTC|774|FATAL: timeline 2 of the primary does not match recovery target timeline 4
ReplyDelete
Replies

Add comment