Posgres Concurrency Notes: 1

“Correct behavior” basically means serializability. The transactions should appear to execute one at a time, in isolation. When two transactions physically overlap in time, we don’t care which one appears to go first; but we don’t want to see any behavior that reveals that they were concurrent. Any such behavior would look like a bug to one or both users.

The easy way to do this would be to force transactions to actually go one at a time . . . but for high-volume sites that kills performance. Ideally, no transaction should block another from proceeding.

Here’s the key idea in PostgreSQL’s implementation of concurrency:

Snapshots isolate transactions from the effects of concurrently running transactions.
Each transaction runs against a notional “snapshot” of the committed state of the database at a particular instant. It cannot see the effects of uncommitted transactions (except itself), nor of transactions that commit after the snapshot time.

Every update of a database row generates a new version of the row.

Snapshot information determines which version of a given row is visible to a given transaction. Old versions of rows can be reclaimed (by VACUUM) once they are no longer visible to any running transaction.

You can run PostgreSQL transactions in either of two modes:

SERIALIZABLE: a snapshot is taken at start of a transaction (actually, at start of first query within transaction). A reader’s view of the database is thus fixed throughout a transaction.
READ COMMITTED: a new snapshot is taken at start of each interactive query. So, view of the database is stable within an interactive query, but can change across queries within a transaction.

T1:                           
BEGIN
S1
S2 T2:
S3 UP1
COMMIT
S4
COMMIT

If you start one serializable transaction and hold it for the entire run of an application, you’ll never observe any changes in database state, other than those you make yourself. This is what you want in a few cases (for instance, the backup utility pg dump does it to make a self-consistent dump) but usually it’s not what you want.

Holding a transaction open means that rows potentially visible to that transaction can’t be reclaimed during VACUUM, even if they’re long dead to everyone else.

Two concurrent transactions cannot write (UPDATE, DELETE, or SELECT FOR UPDATE) the same row. They can proceed as long as they write nonoverlapping sets of rows.
If a transaction tries to write a row already written by a concurrent transaction: 1. If other transaction is still in progress, wait for it to commit or abort.
2. If it aborted, proceed (using the non-updated version of the row).
3. If it committed:
(a) In SERIALIZABLE mode: abort with “can’t serialize” error.
(b) In READ COMMITTED mode: proceed, using the newly-committed (latest) version of the row as the starting point for my operation. (But first, recheck the new row to see if it still satisfies the WHERE clause of my query; if not, ignore it.)

We want to keep count of hits on different pages of a website. We have a table with one row per webpage, and we are going to have a lot of clients (webserver threads) concurrently executing something like
UPDATE webpages SET hits = hits + 1 WHERE url = ’…’;
Question: is this safe? In particular, can we ever “lose” a hit count because two clients do this statement in parallel on the same row?

In SERIALIZABLE mode, the second client would instead get a “Can’t serialize access due to concurrent update” error, and would have to retry the transaction until he succeeded. You always need to code a retry loop in the client when you are using SERIALIZABLE mode for a writing transaction.

Suppose we wanted to do our updates as a SELECT then UPDATE:

BEGIN;
SELECT hits FROM webpages WHERE url = ’…’;
— client internally computes $newval = $hits + 1
UPDATE webpages SET hits = $newval WHERE url = ’…’;
COMMIT;

Notice SELECT FOR UPDATE may return data committed after its start time, which is different from plain SELECT’s behavior in READ COMMITTED mode.

BEGIN; 
SELECT hits FROM webpages WHERE url = ’…’;
UPDATE webpages SET hits = hits + 1 WHERE url = ’…’;
COMMIT;

The READ COMMITTED SELECT FOR UPDATE approach is essentially pessimistic locking: get the lock first, then do your work.
The SERIALIZABLE retry-loop approach is essentially optimistic locking with retry. It avoids the overhead of getting the row lock beforehand, at the cost of having to redo the whole transaction if there is a conflict.
SELECT FOR UPDATE wins if contention is heavy (since the serializable approach will be wasting work pretty often). SERIALIZABLE wins if contention for specific rows is light.

We have a classic deadlock situation. PostgreSQL will detect the deadlock and get out of it by aborting one of the two transactions with a deadlock error. (The other one will then be able to complete.)

We can recover from such failures by placing a retry loop around the whole transaction on the client side. The transaction that got killed to resolve the deadlock will be retried, and eventually will complete successfully.
In such cases READ COMMITTED isn’t any simpler to program than SERIALIZABLE mode: you need the same retry loop either way. (The expected failures are slightly different, but the response to them is the same.)

BEGIN; 
UPDATE my accounts SET balance = balance - $withdrawal WHERE accountid = ’checking’;
SELECT SUM(balance) FROM my accounts;
if (sum > = 1000.00)
COMMIT;
else
ROLLBACK;

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store