Posgres Concurrency Notes: 1

kiran
5 min readJan 30, 2022

Excerpts from the official docs

I am just highlighting the points from this doc on concurrency.

Transactions need correct behavior and performance

“Correct behavior” basically means serializability. The transactions should appear to execute one at a time, in isolation. When two transactions physically overlap in time, we don’t care which one appears to go first; but we don’t want to see any behavior that reveals that they were concurrent. Any such behavior would look like a bug to one or both users.

The easy way to do this would be to force transactions to actually go one at a time . . . but for high-volume sites that kills performance. Ideally, no transaction should block another from proceeding.

Concurrent Transactions ? How ? Snapshots ?

Here’s the key idea in PostgreSQL’s implementation of concurrency:

Snapshots isolate transactions from the effects of concurrently running transactions.
Each transaction runs against a notional “snapshot” of the committed state of the database at a particular instant. It cannot see the effects of uncommitted transactions (except itself), nor of transactions that commit after the snapshot time.

Every update of a database row generates a new version of the row.

Snapshot information determines which version of a given row is visible to a given transaction. Old versions of rows can be reclaimed (by VACUUM) once they are no longer visible to any running transaction.

Transactions modes (isolation levels)

You can run PostgreSQL transactions in either of two modes:

SERIALIZABLE: a snapshot is taken at start of a transaction (actually, at start of first query within transaction). A reader’s view of the database is thus fixed throughout a transaction.
READ COMMITTED: a new snapshot is taken at start of each interactive query. So, view of the database is stable within an interactive query, but can change across queries within a transaction.

T1:                           
BEGIN
S1
S2 T2:
S3 UP1
COMMIT
S4
COMMIT

When t1 runs in read committed, the updates that T2 does are visible to S4.
Whereas when t1 runs in serializable mode, T2’s updates are not visible to any S1, S2, S3 ofcourse and not even to S4.

If you start one serializable transaction and hold it for the entire run of an application, you’ll never observe any changes in database state, other than those you make yourself. This is what you want in a few cases (for instance, the backup utility pg dump does it to make a self-consistent dump) but usually it’s not what you want.

So pg dump runs in serializable mode.

Open transactions hold on to the dead rows and hence don’t keep a transaction open for a long time

Holding a transaction open means that rows potentially visible to that transaction can’t be reclaimed during VACUUM, even if they’re long dead to everyone else.

Two concurrent transactions cannot write (UPDATE, DELETE, or SELECT FOR UPDATE) the same row. They can proceed as long as they write nonoverlapping sets of rows.
If a transaction tries to write a row already written by a concurrent transaction: 1. If other transaction is still in progress, wait for it to commit or abort.
2. If it aborted, proceed (using the non-updated version of the row).
3. If it committed:
(a) In SERIALIZABLE mode: abort with “can’t serialize” error.
(b) In READ COMMITTED mode: proceed, using the newly-committed (latest) version of the row as the starting point for my operation. (But first, recheck the new row to see if it still satisfies the WHERE clause of my query; if not, ignore it.)

We want to keep count of hits on different pages of a website. We have a table with one row per webpage, and we are going to have a lot of clients (webserver threads) concurrently executing something like
UPDATE webpages SET hits = hits + 1 WHERE url = ’…’;
Question: is this safe? In particular, can we ever “lose” a hit count because two clients do this statement in parallel on the same row?

An update is read + write and in postgres read committed mode, the row is reread if there is an update in between by other transaction(which creates a newer version of the row) and hence the count updates are reliable when multiple clients execute concurrently.

In SERIALIZABLE mode, the second client would instead get a “Can’t serialize access due to concurrent update” error, and would have to retry the transaction until he succeeded. You always need to code a retry loop in the client when you are using SERIALIZABLE mode for a writing transaction.

Suppose we wanted to do our updates as a SELECT then UPDATE:

BEGIN;
SELECT hits FROM webpages WHERE url = ’…’;
— client internally computes $newval = $hits + 1
UPDATE webpages SET hits = $newval WHERE url = ’…’;
COMMIT;

Here read committed doesn’t work as the row is reread but the newval which is computed internally after select is already calculated and hence it was based on previous hits count.
To make the above work, use SELECT FOR UPDATE, not just SELECT, to read the input data.

Notice SELECT FOR UPDATE may return data committed after its start time, which is different from plain SELECT’s behavior in READ COMMITTED mode.

You can achieve the same using serializable with a loop.

BEGIN; 
SELECT hits FROM webpages WHERE url = ’…’;
UPDATE webpages SET hits = hits + 1 WHERE url = ’…’;
COMMIT;

The above looks right, but there is an inconsistency in what your app sees at select and at update.

The READ COMMITTED SELECT FOR UPDATE approach is essentially pessimistic locking: get the lock first, then do your work.
The SERIALIZABLE retry-loop approach is essentially optimistic locking with retry. It avoids the overhead of getting the row lock beforehand, at the cost of having to redo the whole transaction if there is a conflict.
SELECT FOR UPDATE wins if contention is heavy (since the serializable approach will be wasting work pretty often). SERIALIZABLE wins if contention for specific rows is light.

Transfer $100.00 from Alice’s account to Bob’s account.
Suppose another transaction is concurrently transferring from Bob’s account to Alice’s.

We have a classic deadlock situation. PostgreSQL will detect the deadlock and get out of it by aborting one of the two transactions with a deadlock error. (The other one will then be able to complete.)

We can recover from such failures by placing a retry loop around the whole transaction on the client side. The transaction that got killed to resolve the deadlock will be retried, and eventually will complete successfully.
In such cases READ COMMITTED isn’t any simpler to program than SERIALIZABLE mode: you need the same retry loop either way. (The expected failures are slightly different, but the response to them is the same.)

Multi-row constraints, or why serializable mode isn’t really serializable

BEGIN; 
UPDATE my accounts SET balance = balance - $withdrawal WHERE accountid = ’checking’;
SELECT SUM(balance) FROM my accounts;
if (sum > = 1000.00)
COMMIT;
else
ROLLBACK;

Is this safe? NO, not even in serializable mode

--

--

kiran

I am a software engineer by profession. I write so as to remind my selves when I forget :-)