# Zero-Downtime Postgres Migrations: The Expand-Contract Playbook for 2026

> How to change a live Postgres schema without downtime or broken deploys — the expand-contract pattern, safe column changes, batched backfills, and CONCURRENTLY.

You can't take a breaking schema change to a live database in one step without risking downtime or a broken deploy. The expand-contract pattern decomposes every change into backward-compatible steps — expand the schema, backfill and dual-write, migrate reads, then contract — deployed across releases so old and new application code run side by side the whole time.

Changing the schema of a database that's serving live traffic is one of the most reliably dangerous things a team does. The failure isn't usually a dropped table — it's subtler: a migration that takes an exclusive lock and stalls every write for two minutes, or a deploy where the new code expects a column the old code just renamed, so for thirty seconds half your servers throw errors. The fix is not a better migration tool. It's a **pattern**: never put the database in a state that the currently-deployed application code can't handle.

That pattern is **expand-contract**, and this guide is the playbook.

## The one rule: every step is backward-compatible

Modern deploys are rolling — old and new versions of your app run **at the same time** for at least a few seconds, often minutes. So the schema must be valid for *both* at every moment. That single constraint rules out the entire class of "change it in place" migrations and replaces them with a sequence of additive steps:

> **Expand** the schema → **migrate** the data and writes → **contract** away the old — with each phase deployed separately, so you can stop or roll back at any point.

If you internalize one thing: **add before you remove, and never remove in the same release you add.**

## The phases

### Expand — add, never change

Make the change additively. Add a new nullable column, a new table, a new index — anything the old code can simply ignore. Crucially, avoid operations that rewrite or long-lock a large table:

- Add columns **nullable** (a *non-volatile* `DEFAULT` is cheap on modern Postgres — stored in the catalog and applied lazily; a *volatile* one like `random()` or `clock_timestamp()` rewrites the table — don't).
- Create indexes with **`CREATE INDEX CONCURRENTLY`** so the build doesn't block writes.
- Add constraints in two steps: **`ADD CONSTRAINT … NOT VALID`** (fast, skips the full scan) then **`VALIDATE CONSTRAINT`** (online).

After the expand phase, the old application still runs unchanged — you've only added things it doesn't know about.

### Migrate — backfill, dual-write, switch reads

Now move the data and the traffic, still without breaking the old path:

1. **Backfill in batches.** Populate the new column/table from the old in small, resumable chunks (by id or time range), pausing between batches. A single `UPDATE` across millions of rows holds locks, bloats the table, and floods replication — the very outage you're avoiding.
2. **Dual-write.** Deploy app code that writes **both** the old and new shape on every change but still reads the old. Live traffic now keeps both current, and this deploy is safely reversible.
3. **Migrate reads.** Deploy app code that reads the new shape. The old data is still there as a safety net; verify the new path in production before trusting it.

### Contract — remove the old, later

In a **separate, later release** — after the new path is proven and no deployed code references the old column or constraint — drop the old column, stop dual-writing, and clean up. Putting a release boundary between *add* and *remove* is exactly what keeps the whole change reversible: if the new path misbehaves, the old column is still there to fall back to.

## The moves that bite people

- **Renames.** A rename is a remove-and-add disguised as one step. Do it the long way: add new column → dual-write → backfill → switch reads → drop old. Never `RENAME` in place on a live table.
- **`NOT NULL` on a big table.** Add the column nullable, backfill, add a `CHECK (col IS NOT NULL) NOT VALID` and `VALIDATE` it, then run `SET NOT NULL` — which now *skips* the table scan because the validated `CHECK` already proves there are no NULLs — and drop the redundant `CHECK`. The naive `ALTER … SET NOT NULL` without that validated check scans the whole table under an exclusive lock.
- **Type changes.** Treat like a rename: add a new column of the new type, dual-write/backfill, switch, drop. An in-place `ALTER TYPE` can rewrite and lock the table.
- **Dropping things eagerly.** Dropping a column or constraint while old code still references it breaks that code. Contract only after the old code is fully gone.

> [!WARNING]
> The dangerous migrations are the ones that take a long lock or rewrite a large table: adding a column with a volatile default, `SET NOT NULL` directly, a plain `CREATE INDEX`, or a single massive `UPDATE`. Each blocks writes for the duration. Every safe alternative above exists to avoid exactly that lock.

> [!TIP]
> Tools like [pgroll](/tools/pgroll) automate expand-contract by keeping multiple schema versions live behind views, so old and new app versions each see the shape they expect during the rollout — turning the discipline above into a managed, reversible workflow.

## Putting it together

Zero-downtime migration is a sequencing discipline, not a feature. Decompose every breaking change into **expand → backfill → dual-write → migrate reads → contract**, deploy each phase on its own, build indexes `CONCURRENTLY`, validate constraints in two steps, and never remove in the same release you add. Do that and a schema change becomes a series of boring, reversible deploys instead of a maintenance window.

For running this end to end, the [postgres-migration-engineer](/agents/data-ai/postgres-migration-engineer) plans and executes the phased rollout with your migration tooling; the [DB Migrate](/commands/db/db-migrate) command generates and applies an individual migration with these safeguards built in.

---

_Source: https://agentscamp.com/guides/database/zero-downtime-postgres-migrations — Guide on AgentsCamp._
