Migration Writer
Write a safe, reversible, zero-downtime database migration using expand-contract — add the new shape, backfill in batches, switch reads/writes, then drop the old — so every deploy stays compatible with the running app version. Use when adding or changing schema on a live system, renaming/dropping a column, adding NOT NULL or a foreign key on a large table, or when a migration risks locks, table rewrites, or an unrevertable step.
npx agentscamp add skills/migration-writerInstall to ~/.claude/skills/migration-writer/SKILL.md
Most schema migrations break prod because they assume a single atomic deploy where old and new code never coexist. This skill writes migrations the expand-contract way: each step is independently deployable, reversible with a real down migration, and free of write-blocking locks — so a live system stays up while the schema changes underneath it.
Most schema migrations break production not because the SQL is wrong but because they assume the database and the app flip over in one atomic instant. During a rolling deploy, old and new code run at the same time against one schema — so a migration that the new code needs will crash the old code, and a rollback that the old code needs is gone the moment you DROP. This skill writes migrations the expand-contract way: each step is independently deployable against the version before and after it, every change has a real down, and no step takes a lock that blocks writes on a hot table.
When to use this skill
- Adding, renaming, dropping, or retyping a column on a table that a live app reads/writes.
- Adding
NOT NULL, aCHECK, a foreign key, or a unique constraint to a table with existing rows. - Creating an index on a large/busy table, or backfilling a new column across millions of rows.
- Splitting/merging tables, moving a column, or any change where old and new app code must coexist during the deploy.
Instructions
- Decide the expand-contract phases first, before writing SQL. A column rename
a → bis not one migration; it is: (1) addbnullable, (2) dual-writeaandbin app code, (3) backfillbfroma, (4) switch reads tob, (5) stop writinga, (6) dropa. Each phase ships and is safe to roll back to the phase before it. Name the phases explicitly in the output, mapped to app deploys. - Make additive changes nullable / without a default rewrite.
ADD COLUMN ... NULLis instant. Adding a column with a non-constant default (or, on old engines, any default) rewrites the table under a lock — split it into add-nullable, then backfill, then set default for future rows. - Add
NOT NULLandCHECKwithout a blocking scan. On Postgres:ADD CONSTRAINT ... CHECK (...) NOT VALID, thenVALIDATE CONSTRAINT(takes only aSHARE UPDATE EXCLUSIVElock, doesn't block writes). ForNOT NULL, add the validatedCHECK (col IS NOT NULL)first, then promote — neverSET NOT NULLcold on a big table, which full-scans under anACCESS EXCLUSIVElock. - Build indexes and FKs concurrently / unvalidated.
CREATE INDEX CONCURRENTLY(andDROP INDEX CONCURRENTLY) so writes keep flowing; add foreign keys asNOT VALIDthenVALIDATE CONSTRAINTin a second step. Concurrent index builds run outside a transaction — keep them in their own migration with no other statements. - Backfill in bounded batches, never one transaction. Update in chunks (e.g.
WHERE id BETWEEN ...orLIMIT nloops) committing each batch, with a short sleep between batches to spare replication and locks. Keep the backfill in a separate migration/job from the schema DDL so a slow backfill can't hold a DDL lock and a failed batch doesn't roll back the whole table. - Write a real
downfor everyup. The down must actually reverse the change (drop the added column/index/constraint), or, where reversal loses data (a dropped column, a narrowed type), say so loudly and add an export/backup step to the up rather than pretending it's reversible. - State the deploy ordering contract. For each migration, note which app version it requires and which it must remain compatible with: backward-compatible (expand) migrations run before the code that needs them; destructive (contract) migrations run after all code that used the old shape is fully rolled out and confirmed.
WARNING
A single-transaction backfill (UPDATE big_table SET ... with no batching) holds row locks on every touched row until commit, bloats WAL, can deadlock with live traffic, and on failure rolls back hours of work. Always batch and commit; treat any unbounded UPDATE/DELETE on a large table as a production incident waiting to happen.
WARNING
Type changes that rewrite the table (ALTER COLUMN ... TYPE between incompatible types, e.g. int → bigint on older Postgres) take an ACCESS EXCLUSIVE lock and block all reads and writes for the duration. Prefer expand-contract: add a new column of the target type, backfill, switch over, drop the old — never an in-place rewrite on a hot table.
NOTE
Don't take ACCESS EXCLUSIVE DDL with lock_timeout = 0. Set a short lock_timeout (e.g. 5s) so a migration that can't grab its lock fails fast and retries, instead of queueing behind a long query and stalling every write that piles up behind it.
Output
For the requested change, produce:
- The
upanddownmigration — split into separate files per expand-contract phase, withCONCURRENTLY/NOT VALID/VALIDATEused where they avoid blocking locks. - The backfill + rollout sequence — the ordered phases (add → dual-write → backfill → switch reads → stop old writes → drop), each tagged with the app deploy it pairs with, and the batched backfill loop as a separate step.
- Locking & risk notes — for each statement: the lock it takes, whether it blocks reads/writes, whether it rewrites the table, and whether the
downis lossless — with destructive/irreversible steps called out explicitly.
Frequently asked questions
- Why split a column rename into multiple migrations instead of ALTER ... RENAME?
- A rename is atomic in the database but not in your fleet: during a rolling deploy, old pods still query the old name while new pods query the new one, so one side throws. Expand-contract (add new, backfill, dual-write, switch reads, drop old) keeps both names valid across the deploy window.
- Do I really need a down migration if we only roll forward?
- Yes. The down is your tested rollback path when a deploy goes bad at 2am, and writing it forces you to confront irreversible steps (a dropped column's data is gone) before they ship — which is exactly when you'd add a backup/export step instead.
Related
- Postgres Index StrategistRecommend the right Postgres index for a query or workload — choosing B-Tree vs. GIN vs. BRIN vs. partial/covering/expression, checking for redundant or unused indexes, and verifying the choice against the query plan. Use when a query needs an index, when deciding an index type for jsonb/array/full-text/time-series data, or when auditing an over-indexed table.
- SemVer AdvisorDecide the correct semantic-version bump — major, minor, or patch — by diffing a release range, mapping the changes onto the public API surface, and classifying each as breaking, additive, or a fix. Use before cutting a release when you are unsure whether changes are breaking, when a teammate proposes a bump you want to sanity-check, or when a behavior change has no signature change and you need to know if it is still breaking.
- Version BumperBump the project version everywhere it lives in one consistent pass — package.json, lockfile, nested/CLI package manifests, version constants, README badges, docs — then roll the changelog's Unreleased section under the new version and stage an annotated git tag. Use when you've already decided the new version (X.Y.Z or a pre-release like -rc.1) and need every artifact updated to the same value without drift, or before cutting a release.
- Connection Pool TunerSize and tune a database connection pool from the real constraint — the database's shared max_connections and its core count — so total connections (per-instance pool × instance count) stay safely under the cap and a too-large pool stops adding latency. Use when the app throws 'too many connections' or pool-acquire timeouts, when the DB is saturated by connection count, or when deploying to serverless.
- Deadlock DiagnoserDiagnose a database deadlock from the engine's own deadlock report, reconstruct the lock cycle (A holds 1 wants 2, B holds 2 wants 1), name the root cause — almost always two code paths locking the same rows in different orders — and fix it with consistent lock ordering, shorter transactions, and a retry-the-victim safeguard. Use when the DB logs deadlock errors, when transactions intermittently fail under load, or when queries mysteriously block each other.
- Query Plan AnalyzerRead a slow query's execution plan and turn it into a concrete fix — the exact index to add, the rewrite, or the ANALYZE to run — by getting the REAL plan with EXPLAIN ANALYZE (actual rows + timing, not estimates), finding the offending node, and confirming the fix removes it. Use when one specific query is slow and you need to know WHY, not just that it is.
- Strangler Fig MigratorPlan the incremental replacement of a legacy module or service using the strangler-fig pattern — grow new code around the old behind an interception seam until the old is dead, instead of a big-bang rewrite. Use when a legacy system is too risky to rewrite at once, or when migrating off a deprecated framework/dependency gradually while staying shippable and rollback-able at every step.
- Canary Release PlannerDesign a canary / progressive rollout so a bad release reaches 1% of users instead of 100% — staged traffic with bake times, gating metrics compared against the concurrently-running stable baseline, and automated promote-or-rollback. Use when shipping a risky change, when you want automatic rollback on regression, or when moving off all-at-once deploys.
- Release Notes WriterWrite user-facing release notes — the curated 'what's new and what it means for you' — by starting from the real changes (git log / merged PRs / the changelog since the last release) and translating developer-speak into user impact, grouped by what the user cares about with breaking changes and required actions surfaced first. Use when shipping a release to users or customers and the raw commit log isn't something a user should read, when you need a published GitHub-release / blog / in-app announcement, or when a breaking change must be made unmissable so upgrades don't break.
- Seed DataGenerate realistic, referentially-consistent seed data and a re-runnable seed script from your actual schema — types and constraints respected, plausible values, FK-dependency insert order, idempotent, never aimed at production.