Postgres replica conflicts: Part 2

kiran
3 min readFeb 5, 2022

How to avoid and what is max_standby_streaming_delay

Photo by Erik Mclean on Unsplash

There are many references which describe how to fix the snapshot replica conflict.I will mention some of them at the end of this post.

All the below tweaks are on the replica.

max_standby_streaming_delay is one of those parameters which can be tweaked for fixing this issue to an extent.I had to go through umpteen docs to actually understand what this parameter is and what it does.
So this post is not about which is the best way to fix the conflict issue but to understand the nuances of max_standby_streaming_delay.

Delay WAL Application on Replica

One way to fix the replica conflict is to delay the application of the wal (which in our case was about to delete the dead rows) on the replica where our query is running or cancel the query itself.

max_standby_streaming_delay is one such param which defines the maximum allowed delay in WAL application and even after this delay if still the query is running, its cancelled.

Lets run through few scenarios for which lets fix max_standby_streaming_delay as 30 seconds and assume t1 is a conflicting query on replica.

  • A transaction t1 started at 10:01:00 on replica.
    At 10:01:30, a wal arrives at the replica (dead rows deletion change).
    At 10:01:35, t1 completes.
    As t1 which is in conflict with wal completed, wal gets applied now.
  • A transaction t1 started at 10:01:00 on replica.
    At 10:01:30, a wal arrives at the replica (dead rows deletion change).
    At 10:01:30, t1 is still running and it is now in conflict state with the wal.
    At 10:02:00, t1 is still running.Wal changes were delayed by 30 seconds from 10:01:30 to 10:02:00 but still t1 did not complete, hence t1 is cancelled and the wal is applied.
  • A transaction t1 started at 10:01:00 on replica.
    At 10:01:30, a wal arrives at the replica (dead rows deletion change).
    At 10:01:30, t1 is still running and it is now in conflict state with the wal which just arrived.
    At 10:01:56, t2 started and it is also in conflict(assume) with the wal.
    At 10:01:57, t1 completed.
    At 10:02:00, t2 is still running.As the wal which arrived at 10:01:30 has been delayed by 30 seconds, now its applied and t2 will be cancelled.
    The wal changes were delayed by 30 seconds for t1(10:02:00–10:01:30) and 4 seconds for t2.

So max_standby_streaming_delay is the not the maximum delay that the replica delays application of wal per conflicting query(and there by delays cancelling the query) but its the maximum delay from the time the wal arrived at the replica.

What points to remember when tweaking ?

  • In a standby server that exists primarily for high availability, it’s best to set the delay parameters relatively short, so that the server cannot fall far behind the primary due to delays caused by standby queries.
  • The one caveat about long-running queries and setting these values higher is that other queries running on the slave in parallel with the long-running one which is causing the WAL action to be delayed will see old data until the long query has completed.
  • If you increase the param to a too higher value, the value of the follower as a standby / backup diminishes as any other queries running on the follower may return stale data.

--

--

kiran

I am a software engineer by profession. I write so as to remind my selves when I forget :-)