Database Migrations Explained

Zero-Downtime Database Migrations at Scale

Jun 03, 2026

Deconstructing how to alter schemas and move infrastructure without interrupting live traffic.

Database Migration: What It Is and How to Execute It – Astera — Database migration is **the process of moving data from one or more source databases to one or more target databases, or incrementally modifying a database's structural schema**. It ensures data integrity, minimizes downtime, and supports system scalability.

Deploying new application code is a relatively simple operation because compute servers are stateless. You can spin up a fleet of new servers, route incoming traffic to them, and safely terminate the old ones.

Database Migration Patterns. Moving a database might not sound… | by Jared Hatfield | Medium — **Database migrations are significantly more difficult and risky than code migrations**. While code is ephemeral—meaning you can easily roll back a deployment, test branches locally, or deploy fixes instantly—database migrations are permanent, stateful, and directly impact live data. A bad code push causes downtime, but a bad database migration can cause permanent data loss or corruption.

Databases, however, are entirely stateful. You cannot simply delete an old database and deploy a new one without permanently losing user data. Furthermore, relational databases utilize internal locks to maintain strict data integrity. If you attempt to add a new column to a table containing 500 million rows, the database engine may lock that table for hours to rewrite the underlying files, blocking all incoming read and write requests and effectively bringing the platform offline.

To evolve database infrastructure at scale, engineers must abandon single-step changes and utilize phased, non-blocking migration strategies.

The Offline Migration

The simplest strategy is to accept scheduled downtime. The engineering team displays a maintenance page, shuts down all application servers so no new data is written, runs a script to alter the database schema or copy the data to a new server, and then brings the application back online.

While this guarantees absolute data consistency because the data is perfectly frozen during the move, it is rarely a viable option. For globally distributed applications, massive e-commerce platforms, or critical SaaS products, forced downtime is financially and operationally unacceptable. System architecture must be designed to change while the engine is still running.

An offline database migration involves stopping all application traffic, exporting your data from the source database, importing it into the target system, and rerouting traffic. It is best for small datasets (under 100GB), lower environments, or situations where extended service downtime is acceptable.

The Expand and Contract Pattern

When you need to change a schema in a live relational database—for example, splitting a single full_name column into first_name and last_name—you cannot execute the change in a single step. If you drop the full_name column while the legacy application code is still running, the application will instantly crash when it attempts to query the missing data.

Expand Contract Pattern and Continuous Delivery of Databases | by Leena | Continuous Delivery | Medium — The **Expand and Contract pattern** (often called the Parallel Change pattern) is **a software development strategy used to implement breaking changes—such as database migrations or API updates—with zero downtime**. It achieves safe transitions by breaking the release into three continuous stages: **Expand**, **Migrate**, and **Contract**.

The Expand and Contract pattern solves this by breaking the migration into small, backward-compatible deployments:

Expand: You alter the database to add the new first_name and last_name columns, explicitly allowing them to be nullable. You do not touch or alter the existing full_name column. Adding a nullable column is usually a fast metadata operation that avoids long table locks.
Dual Write: You deploy new application code. When a user updates their profile, the application writes the data to the old full_name column and the new first_name/last_name columns simultaneously.
Backfill: A background script iterates through the millions of existing database records, parses the old full_name data, and populates the new columns for all historical rows.
Transition: You deploy another application update that explicitly reads from the new first_name and last_name columns, completely abandoning the old column.
Contract: Months later, once you are absolutely certain the new schema is stable and no legacy code is referencing it, you safely run a script to DROP the old full_name column, reclaiming the disk space.

Database Schema Migrations with Zero Downtime: The Expand-Contract Pattern

𝐋𝐞𝐚𝐫𝐧 𝐭𝐨 𝐛𝐮𝐢𝐥𝐝 𝐆𝐢𝐭, 𝐃𝐨𝐜𝐤𝐞𝐫, 𝐑𝐞𝐝𝐢𝐬, 𝐇𝐓𝐓𝐏 𝐬𝐞𝐫𝐯𝐞𝐫𝐬, 𝐚𝐧𝐝 𝐜𝐨𝐦𝐩𝐢𝐥𝐞𝐫𝐬, 𝐟𝐫𝐨𝐦 𝐬𝐜𝐫𝐚𝐭𝐜𝐡. Get 40% OFF CodeCrafters: https://app.codecrafters.io/join?via=the-coding-gopher

Migrating Engines: Dual Writes

If you are migrating off one database engine entirely—such as moving from a relational database like PostgreSQL to a NoSQL database like DynamoDB—you must run both databases in parallel to ensure a seamless cutover.

This is typically managed at the application layer:

The application is updated to write incoming data to both the old PostgreSQL database and the new DynamoDB database. PostgreSQL remains the primary source of truth for all reads.
A background job copies all historical data from PostgreSQL to DynamoDB.
The application is updated to execute “dark reads.” It queries both databases, but only returns the PostgreSQL data to the user. In the background, it compares the DynamoDB result to the PostgreSQL result and logs any discrepancies, mathematically proving the new database is functioning correctly.
Once validation is completely clean, a configuration flag is flipped. The application starts reading directly from DynamoDB, and PostgreSQL is eventually decommissioned.

Note: A dark read (often referred to alongside "dark writes") is a software testing technique where a new service, database, or feature is run in parallel with an existing production system. User requests are sent to the original system, but the request is simultaneously duplicated and sent "in the dark" to the new system

While effective, application-level dual writes are highly susceptible to partial failures. If the application successfully writes to PostgreSQL but a brief network timeout causes the DynamoDB write to fail, the two databases are instantly out of sync.

Infrastructure Synchronization: Change Data Capture (CDC)

To solve the partial failure problem of dual writes, engineers push the synchronization logic down from the application code into the infrastructure layer using Change Data Capture (CDC).

What Is Change Data Capture - Seattle Data Guy — **Change Data Capture (CDC)** is **a data integration pattern that identifies and tracks row-level changes (inserts, updates, and deletes) in a database**. Instead of extracting entire datasets repeatedly, CDC streams only the modified records to downstream systems (like data warehouses or event buses) in near real-time.

Relational databases maintain an internal, append-only log of every transaction that occurs to survive sudden crashes (such as the Write-Ahead Log in PostgreSQL or the Binlog in MySQL).

A CDC tool constantly tails this internal log. When the application writes a row to the primary database, the CDC system instantly reads that raw transaction event and asynchronously streams it—often via an event queue like Apache Kafka—directly into the secondary database.

This removes the burden of dual-writing from the application completely. The application only writes to the primary database, and the infrastructure guarantees that the secondary database will eventually and accurately reflect those exact changes, making zero-downtime database migrations significantly safer and more resilient to network failures.

Thanks for reading! This post is public so feel free to share it.

The Coding Gopher

Discussion about this post

Ready for more?