Database Migrations in Production: Lessons Learned
After managing over 50 production database migrations—from small schema changes to complete database engine switches affecting millions of users—I've learned that the difference between a smooth migration and a career-defining disaster often comes down to preparation and having the right rollback strategy.
In this guide, I'll share the hard-won lessons from migrations that saved companies millions, and the mistakes that taught me why you should never underestimate the complexity of production data.
The $2 Million Migration Mistake
Let me start with the migration that changed how I approach database changes forever. It was a "simple" column addition to a user table with 50 million records. The migration looked innocent enough:
ALTER TABLE users ADD COLUMN preferences JSONB;
What we didn't account for was that PostgreSQL would lock the entire table during this operation. For 47 minutes, our entire application was down while the migration completed. The business impact: $2.1 million in lost revenue, thousands of angry customers, and a very uncomfortable all-hands meeting.
This taught me the first rule of production migrations: there are no simple migrations at scale.
The Zero-Downtime Migration Framework
Since that disaster, I've developed a framework that has successfully handled migrations for tables with billions of rows without a single second of downtime. Here's the approach:
Phase 1: Shadow Table Strategy
Instead of altering tables directly, create shadow tables and migrate data incrementally:
-- Create the new table structure
CREATE TABLE users_new (
id BIGSERIAL PRIMARY KEY,
email VARCHAR(255) NOT NULL UNIQUE,
created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
updated_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
preferences JSONB DEFAULT '{}'::jsonb,
-- Copy all existing columns
name VARCHAR(255),
status VARCHAR(50) DEFAULT 'active'
);
-- Copy existing indexes (adjust for new structure)
CREATE INDEX CONCURRENTLY idx_users_new_email ON users_new(email);
CREATE INDEX CONCURRENTLY idx_users_new_status ON users_new(status);
CREATE INDEX CONCURRENTLY idx_users_new_created_at ON users_new(created_at);
-- Add constraints after data migration
-- ALTER TABLE users_new ADD CONSTRAINT users_new_email_check CHECK (email ~ '^[^@]+@[^@]+\.[^@]+$');
Phase 2: Dual-Write Implementation
Implement dual-write logic in your application to write to both tables:
// services/userService.ts
import { DatabaseClient } from './database';
interface UserData {
email: string;
name: string;
preferences?: Record<string, any>;
}
class UserService {
private db: DatabaseClient;
private isDualWriteEnabled: boolean;
constructor(db: DatabaseClient) {
this.db = db;
this.isDualWriteEnabled = process.env.MIGRATION_DUAL_WRITE === 'true';
}
async createUser(userData: UserData): Promise<User> {
const transaction = await this.db.beginTransaction();
try {
// Write to old table (primary)
const user = await transaction.query(`
INSERT INTO users (email, name, created_at, updated_at)
VALUES ($1, $2, NOW(), NOW())
RETURNING *
`, [userData.email, userData.name]);
// If dual-write is enabled, also write to new table
if (this.isDualWriteEnabled) {
await transaction.query(`
INSERT INTO users_new (id, email, name, preferences, created_at, updated_at)
VALUES ($1, $2, $3, $4, $5, $6)
ON CONFLICT (id) DO UPDATE SET
email = EXCLUDED.email,
name = EXCLUDED.name,
preferences = EXCLUDED.preferences,
updated_at = EXCLUDED.updated_at
`, [
user[0].id,
userData.email,
userData.name,
userData.preferences || {},
user[0].created_at,
user[0].updated_at
]);
}
await transaction.commit();
return user[0];
} catch (error) {
await transaction.rollback();
throw error;
}
}
async updateUser(id: number, updates: Partial<UserData>): Promise<User> {
const transaction = await this.db.beginTransaction();
try {
// Update old table
const updateFields: string[] = [];
const values: any[] = [id];
let paramIndex = 2;
Object.entries(updates).forEach(([key, value]) => {
if (key !== 'preferences') { // Handle preferences separately for old table
updateFields.push(`${key} = $${paramIndex}`);
values.push(value);
paramIndex++;
}
});
updateFields.push(`updated_at = NOW()`);
const oldTableResult = await transaction.query(`
UPDATE users
SET ${updateFields.join(', ')}
WHERE id = $1
RETURNING *
`, values);
// Update new table if dual-write is enabled
if (this.isDualWriteEnabled) {
const newUpdateFields: string[] = [];
const newValues: any[] = [id];
let newParamIndex = 2;
Object.entries(updates).forEach(([key, value]) => {
newUpdateFields.push(`${key} = $${newParamIndex}`);
newValues.push(value);
newParamIndex++;
});
newUpdateFields.push(`updated_at = NOW()`);
await transaction.query(`
UPDATE users_new
SET ${newUpdateFields.join(', ')}
WHERE id = $1
`, newValues);
}
await transaction.commit();
return oldTableResult[0];
} catch (error) {
await transaction.rollback();
throw error;
}
}
}
export default UserService;
Phase 3: Background Data Migration
Create a robust background migration process:
// scripts/migrate-users-data.ts
import { DatabaseClient } from '../services/database';
import { createLogger } from '../utils/logger';
const logger = createLogger('user-migration');
interface MigrationConfig {
batchSize: number;
delayMs: number;
maxRetries: number;
validateChecksum: boolean;
}
class UserDataMigration {
private db: DatabaseClient;
private config: MigrationConfig;
private stats = {
processed: 0,
migrated: 0,
skipped: 0,
failed: 0,
startTime: Date.now()
};
constructor(db: DatabaseClient, config: MigrationConfig) {
this.db = db;
this.config = config;
}
async migrate(): Promise<void> {
logger.info('Starting user data migration', { config: this.config });
try {
await this.createMigrationTrackingTable();
const totalRecords = await this.getTotalRecordCount();
logger.info(`Total records to migrate: ${totalRecords}`);
let lastId = await this.getLastMigratedId();
while (true) {
const batch = await this.getBatch(lastId);
if (batch.length === 0) {
break;
}
await this.migrateBatch(batch);
lastId = batch[batch.length - 1].id;
// Progress reporting
this.stats.processed += batch.length;
if (this.stats.processed % 10000 === 0) {
await this.reportProgress(totalRecords);
}
// Throttle to avoid overwhelming the database
if (this.config.delayMs > 0) {
await this.sleep(this.config.delayMs);
}
}
await this.validateMigration();
logger.info('Migration completed successfully', this.stats);
} catch (error) {
logger.error('Migration failed', { error: error.message, stats: this.stats });
throw error;
}
}
private async createMigrationTrackingTable(): Promise<void> {
await this.db.query(`
CREATE TABLE IF NOT EXISTS migration_progress (
migration_name VARCHAR(255) PRIMARY KEY,
last_migrated_id BIGINT DEFAULT 0,
created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
updated_at TIMESTAMP WITH TIME ZONE DEFAULT NOW()
)
`);
await this.db.query(`
INSERT INTO migration_progress (migration_name, last_migrated_id)
VALUES ('users_to_users_new', 0)
ON CONFLICT (migration_name) DO NOTHING
`);
}
private async getLastMigratedId(): Promise<number> {
const result = await this.db.query(`
SELECT last_migrated_id
FROM migration_progress
WHERE migration_name = 'users_to_users_new'
`);
return result[0]?.last_migrated_id || 0;
}
private async getBatch(lastId: number): Promise<any[]> {
const result = await this.db.query(`
SELECT id, email, name, created_at, updated_at, status
FROM users
WHERE id > $1
ORDER BY id
LIMIT $2
`, [lastId, this.config.batchSize]);
return result;
}
private async migrateBatch(batch: any[]): Promise<void> {
const transaction = await this.db.beginTransaction();
try {
for (const user of batch) {
await this.migrateUser(transaction, user);
}
// Update progress tracking
const lastId = batch[batch.length - 1].id;
await transaction.query(`
UPDATE migration_progress
SET last_migrated_id = $1, updated_at = NOW()
WHERE migration_name = 'users_to_users_new'
`, [lastId]);
await transaction.commit();
this.stats.migrated += batch.length;
} catch (error) {
await transaction.rollback();
this.stats.failed += batch.length;
// Retry individual records on batch failure
if (this.config.maxRetries > 0) {
await this.retryFailedRecords(batch);
} else {
throw error;
}
}
}
private async migrateUser(transaction: any, user: any): Promise<void> {
// Check if user already exists in new table
const existing = await transaction.query(`
SELECT id FROM users_new WHERE id = $1
`, [user.id]);
if (existing.length > 0) {
this.stats.skipped++;
return;
}
// Migrate user with default preferences
await transaction.query(`
INSERT INTO users_new (id, email, name, created_at, updated_at, status, preferences)
VALUES ($1, $2, $3, $4, $5, $6, $7)
`, [
user.id,
user.email,
user.name,
user.created_at,
user.updated_at,
user.status || 'active',
'{}' // Default empty preferences
]);
}
private async retryFailedRecords(batch: any[]): Promise<void> {
logger.warn(`Retrying ${batch.length} failed records individually`);
for (const user of batch) {
try {
const transaction = await this.db.beginTransaction();
await this.migrateUser(transaction, user);
await transaction.commit();
this.stats.migrated++;
this.stats.failed--;
} catch (error) {
logger.error(`Failed to migrate user ${user.id}`, { error: error.message });
}
}
}
private async validateMigration(): Promise<void> {
if (!this.config.validateChecksum) {
return;
}
logger.info('Validating migration with checksum comparison...');
// Compare record counts
const [oldCount] = await this.db.query('SELECT COUNT(*) as count FROM users');
const [newCount] = await this.db.query('SELECT COUNT(*) as count FROM users_new');
if (oldCount.count !== newCount.count) {
throw new Error(`Record count mismatch: old=${oldCount.count}, new=${newCount.count}`);
}
// Sample validation - compare random records
const sampleSize = Math.min(1000, oldCount.count);
const randomUsers = await this.db.query(`
SELECT * FROM users
ORDER BY RANDOM()
LIMIT $1
`, [sampleSize]);
for (const user of randomUsers) {
const migratedUser = await this.db.query(`
SELECT * FROM users_new WHERE id = $1
`, [user.id]);
if (migratedUser.length === 0) {
throw new Error(`User ${user.id} not found in migrated table`);
}
// Validate key fields
const migrated = migratedUser[0];
if (user.email !== migrated.email || user.name !== migrated.name) {
throw new Error(`Data mismatch for user ${user.id}`);
}
}
logger.info(`Validation complete: ${sampleSize} records validated`);
}
private async getTotalRecordCount(): Promise<number> {
const result = await this.db.query('SELECT COUNT(*) as count FROM users');
return result[0].count;
}
private async reportProgress(total: number): Promise<void> {
const percentage = ((this.stats.processed / total) * 100).toFixed(2);
const elapsed = Date.now() - this.stats.startTime;
const rate = this.stats.processed / (elapsed / 1000);
logger.info('Migration progress', {
processed: this.stats.processed,
migrated: this.stats.migrated,
failed: this.stats.failed,
skipped: this.stats.skipped,
percentage,
recordsPerSecond: Math.round(rate),
elapsedMs: elapsed
});
}
private sleep(ms: number): Promise<void> {
return new Promise(resolve => setTimeout(resolve, ms));
}
}
// CLI script
async function main() {
const db = new DatabaseClient();
const config: MigrationConfig = {
batchSize: parseInt(process.env.MIGRATION_BATCH_SIZE || '1000'),
delayMs: parseInt(process.env.MIGRATION_DELAY_MS || '100'),
maxRetries: parseInt(process.env.MIGRATION_MAX_RETRIES || '3'),
validateChecksum: process.env.MIGRATION_VALIDATE === 'true'
};
const migration = new UserDataMigration(db, config);
await migration.migrate();
}
if (require.main === module) {
main().catch(console.error);
}
Phase 4: Validation and Cutover
Create comprehensive validation before switching over:
// scripts/validate-migration.ts
import { DatabaseClient } from '../services/database';
import { createLogger } from '../utils/logger';
const logger = createLogger('migration-validator');
class MigrationValidator {
private db: DatabaseClient;
constructor(db: DatabaseClient) {
this.db = db;
}
async validateComplete(): Promise<boolean> {
logger.info('Starting comprehensive migration validation...');
try {
await this.validateRecordCounts();
await this.validateDataIntegrity();
await this.validateConstraints();
await this.validateIndexes();
await this.validatePerformance();
logger.info('All validation checks passed');
return true;
} catch (error) {
logger.error('Validation failed', { error: error.message });
return false;
}
}
private async validateRecordCounts(): Promise<void> {
const [oldCount] = await this.db.query('SELECT COUNT(*) as count FROM users');
const [newCount] = await this.db.query('SELECT COUNT(*) as count FROM users_new');
if (oldCount.count !== newCount.count) {
throw new Error(`Record count mismatch: old=${oldCount.count}, new=${newCount.count}`);
}
logger.info(`Record count validation passed: ${oldCount.count} records`);
}
private async validateDataIntegrity(): Promise<void> {
// Check for missing records
const missingRecords = await this.db.query(`
SELECT COUNT(*) as count
FROM users u
LEFT JOIN users_new un ON u.id = un.id
WHERE un.id IS NULL
`);
if (missingRecords[0].count > 0) {
throw new Error(`${missingRecords[0].count} records missing from new table`);
}
// Sample data comparison
const sampleResults = await this.db.query(`
SELECT
COUNT(*) as total,
COUNT(CASE WHEN u.email = un.email THEN 1 END) as email_match,
COUNT(CASE WHEN u.name = un.name THEN 1 END) as name_match,
COUNT(CASE WHEN u.status = un.status THEN 1 END) as status_match
FROM users u
JOIN users_new un ON u.id = un.id
WHERE u.id % 100 = 0 -- Sample every 100th record
`);
const sample = sampleResults[0];
if (sample.email_match !== sample.total ||
sample.name_match !== sample.total ||
sample.status_match !== sample.total) {
throw new Error(`Data integrity issues found in sample validation`);
}
logger.info(`Data integrity validation passed: ${sample.total} sample records checked`);
}
private async validateConstraints(): Promise<void> {
// Check unique constraints
const duplicateEmails = await this.db.query(`
SELECT email, COUNT(*) as count
FROM users_new
GROUP BY email
HAVING COUNT(*) > 1
`);
if (duplicateEmails.length > 0) {
throw new Error(`Duplicate emails found: ${duplicateEmails.length} violations`);
}
// Check NOT NULL constraints
const nullViolations = await this.db.query(`
SELECT COUNT(*) as count
FROM users_new
WHERE email IS NULL OR name IS NULL
`);
if (nullViolations[0].count > 0) {
throw new Error(`NULL constraint violations: ${nullViolations[0].count} records`);
}
logger.info('Constraint validation passed');
}
private async validateIndexes(): Promise<void> {
// Verify all expected indexes exist
const expectedIndexes = [
'idx_users_new_email',
'idx_users_new_status',
'idx_users_new_created_at'
];
for (const indexName of expectedIndexes) {
const indexExists = await this.db.query(`
SELECT 1 FROM pg_indexes
WHERE tablename = 'users_new' AND indexname = $1
`, [indexName]);
if (indexExists.length === 0) {
throw new Error(`Index ${indexName} is missing`);
}
}
logger.info(`Index validation passed: ${expectedIndexes.length} indexes verified`);
}
private async validatePerformance(): Promise<void> {
// Run sample queries and compare performance
const testQueries = [
"SELECT * FROM users_new WHERE email = 'test@example.com'",
"SELECT COUNT(*) FROM users_new WHERE status = 'active'",
"SELECT * FROM users_new WHERE created_at > NOW() - INTERVAL '1 day'"
];
for (const query of testQueries) {
const startTime = Date.now();
await this.db.query(query.replace('users_new', 'users_new'));
const newTableTime = Date.now() - startTime;
const startTimeOld = Date.now();
await this.db.query(query.replace('users_new', 'users'));
const oldTableTime = Date.now() - startTimeOld;
// New table should perform within 2x of old table
if (newTableTime > oldTableTime * 2) {
logger.warn(`Performance regression detected for query: ${query}`, {
oldTime: oldTableTime,
newTime: newTableTime
});
}
}
logger.info('Performance validation completed');
}
}
Advanced Migration Patterns
Rolling Schema Updates
For applications that can't afford downtime, use backwards-compatible schema changes:
-- Phase 1: Add new column as nullable
ALTER TABLE users ADD COLUMN preferences JSONB;
-- Phase 2: Update application to write to both old and new format
-- (Application deployment)
-- Phase 3: Backfill existing data
UPDATE users
SET preferences = '{}'::jsonb
WHERE preferences IS NULL;
-- Phase 4: Add NOT NULL constraint
ALTER TABLE users ALTER COLUMN preferences SET NOT NULL;
-- Phase 5: Remove old columns (if any) after confirming new system works
-- ALTER TABLE users DROP COLUMN old_column;
Blue-Green Database Deployment
For major version upgrades or engine changes:
// services/database-switcher.ts
import { DatabaseClient } from './database';
class DatabaseSwitcher {
private blueDb: DatabaseClient;
private greenDb: DatabaseClient;
private currentColor: 'blue' | 'green';
constructor() {
this.blueDb = new DatabaseClient(process.env.BLUE_DATABASE_URL);
this.greenDb = new DatabaseClient(process.env.GREEN_DATABASE_URL);
this.currentColor = process.env.ACTIVE_DATABASE as 'blue' | 'green' || 'blue';
}
getCurrentDatabase(): DatabaseClient {
return this.currentColor === 'blue' ? this.blueDb : this.greenDb;
}
getInactiveDatabase(): DatabaseClient {
return this.currentColor === 'blue' ? this.greenDb : this.blueDb;
}
async switchToInactive(): Promise<void> {
// Pre-switch validation
await this.validateInactiveDatabase();
// Switch traffic
this.currentColor = this.currentColor === 'blue' ? 'green' : 'blue';
// Update environment variable for other processes
process.env.ACTIVE_DATABASE = this.currentColor;
// Post-switch validation
await this.validateCurrentDatabase();
}
async validateInactiveDatabase(): Promise<void> {
const inactiveDb = this.getInactiveDatabase();
// Check connectivity
await inactiveDb.query('SELECT 1');
// Validate data freshness (should be recent)
const result = await inactiveDb.query(`
SELECT MAX(updated_at) as last_update FROM users
`);
const lastUpdate = new Date(result[0].last_update);
const now = new Date();
const staleness = now.getTime() - lastUpdate.getTime();
if (staleness > 5 * 60 * 1000) { // 5 minutes
throw new Error(`Inactive database is stale by ${staleness}ms`);
}
}
async validateCurrentDatabase(): Promise<void> {
const currentDb = this.getCurrentDatabase();
// Run health checks
await currentDb.query('SELECT COUNT(*) FROM users');
// Could add application-level health checks here
}
}
export default DatabaseSwitcher;
Monitoring and Alerting
Set up comprehensive monitoring during migrations:
// utils/migration-monitor.ts
import { DatabaseClient } from '../services/database';
import { MetricsClient } from './metrics';
class MigrationMonitor {
private db: DatabaseClient;
private metrics: MetricsClient;
private monitoringInterval: NodeJS.Timeout | null = null;
constructor(db: DatabaseClient, metrics: MetricsClient) {
this.db = db;
this.metrics = metrics;
}
startMonitoring(): void {
this.monitoringInterval = setInterval(async () => {
try {
await this.collectMetrics();
} catch (error) {
console.error('Monitoring error:', error);
}
}, 30000); // Every 30 seconds
}
stopMonitoring(): void {
if (this.monitoringInterval) {
clearInterval(this.monitoringInterval);
this.monitoringInterval = null;
}
}
private async collectMetrics(): Promise<void> {
// Database performance metrics
const performance = await this.db.query(`
SELECT
schemaname,
tablename,
n_tup_ins as inserts,
n_tup_upd as updates,
n_tup_del as deletes,
seq_scan as sequential_scans,
idx_scan as index_scans
FROM pg_stat_user_tables
WHERE tablename IN ('users', 'users_new')
`);
performance.forEach(table => {
this.metrics.gauge(`db.table.${table.tablename}.inserts`, table.inserts);
this.metrics.gauge(`db.table.${table.tablename}.updates`, table.updates);
this.metrics.gauge(`db.table.${table.tablename}.deletes`, table.deletes);
this.metrics.gauge(`db.table.${table.tablename}.seq_scans`, table.sequential_scans);
this.metrics.gauge(`db.table.${table.tablename}.idx_scans`, table.index_scans);
});
// Migration progress
const progress = await this.db.query(`
SELECT last_migrated_id FROM migration_progress
WHERE migration_name = 'users_to_users_new'
`);
if (progress.length > 0) {
this.metrics.gauge('migration.last_migrated_id', progress[0].last_migrated_id);
}
// Replication lag (if using logical replication)
const replicationLag = await this.db.query(`
SELECT
slot_name,
confirmed_flush_lsn,
pg_current_wal_lsn() as current_lsn,
pg_wal_lsn_diff(pg_current_wal_lsn(), confirmed_flush_lsn) as lag_bytes
FROM pg_replication_slots
`);
replicationLag.forEach(slot => {
this.metrics.gauge(`db.replication.${slot.slot_name}.lag_bytes`, slot.lag_bytes || 0);
});
// Connection counts
const connections = await this.db.query(`
SELECT
state,
COUNT(*) as count
FROM pg_stat_activity
WHERE datname = current_database()
GROUP BY state
`);
connections.forEach(conn => {
this.metrics.gauge(`db.connections.${conn.state}`, conn.count);
});
}
async alertOnAnomalies(): Promise<void> {
// Check for migration stalls
const lastProgress = await this.db.query(`
SELECT
last_migrated_id,
updated_at,
EXTRACT(EPOCH FROM (NOW() - updated_at)) as stall_seconds
FROM migration_progress
WHERE migration_name = 'users_to_users_new'
`);
if (lastProgress.length > 0 && lastProgress[0].stall_seconds > 300) {
this.metrics.increment('migration.alerts.stall');
// Send alert to monitoring system
await this.sendAlert('Migration stalled', {
seconds: lastProgress[0].stall_seconds,
lastId: lastProgress[0].last_migrated_id
});
}
// Check for high error rates
const errorLogs = await this.db.query(`
SELECT COUNT(*) as error_count
FROM pg_stat_database_conflicts
WHERE datname = current_database()
`);
if (errorLogs[0].error_count > 10) {
await this.sendAlert('High database error rate', {
errorCount: errorLogs[0].error_count
});
}
}
private async sendAlert(message: string, data: any): Promise<void> {
// Integration with your alerting system (Slack, PagerDuty, etc.)
console.error(`ALERT: ${message}`, data);
// Example Slack webhook
if (process.env.SLACK_WEBHOOK_URL) {
await fetch(process.env.SLACK_WEBHOOK_URL, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
text: `🚨 Migration Alert: ${message}`,
attachments: [{
color: 'danger',
fields: Object.entries(data).map(([key, value]) => ({
title: key,
value: String(value),
short: true
}))
}]
})
});
}
}
}
export default MigrationMonitor;
Testing Strategies
Automated Migration Testing
// tests/migration.test.ts
import { DatabaseClient } from '../services/database';
import { UserDataMigration } from '../scripts/migrate-users-data';
import { MigrationValidator } from '../scripts/validate-migration';
describe('User Migration', () => {
let db: DatabaseClient;
let migration: UserDataMigration;
let validator: MigrationValidator;
beforeAll(async () => {
// Use test database
db = new DatabaseClient(process.env.TEST_DATABASE_URL);
migration = new UserDataMigration(db, {
batchSize: 100,
delayMs: 0, // No delay in tests
maxRetries: 3,
validateChecksum: true
});
validator = new MigrationValidator(db);
});
beforeEach(async () => {
// Clean slate for each test
await db.query('TRUNCATE TABLE users, users_new RESTART IDENTITY CASCADE');
await db.query('DELETE FROM migration_progress');
});
afterAll(async () => {
await db.close();
});
test('should migrate small dataset successfully', async () => {
// Create test data
const testUsers = [
{ email: 'user1@test.com', name: 'User One', status: 'active' },
{ email: 'user2@test.com', name: 'User Two', status: 'inactive' },
{ email: 'user3@test.com', name: 'User Three', status: 'active' }
];
for (const user of testUsers) {
await db.query(`
INSERT INTO users (email, name, status) VALUES ($1, $2, $3)
`, [user.email, user.name, user.status]);
}
// Run migration
await migration.migrate();
// Validate results
const isValid = await validator.validateComplete();
expect(isValid).toBe(true);
// Check specific data
const migratedUsers = await db.query('SELECT * FROM users_new ORDER BY id');
expect(migratedUsers).toHaveLength(3);
expect(migratedUsers[0].email).toBe('user1@test.com');
expect(migratedUsers[0].preferences).toEqual({});
});
test('should handle large dataset with batching', async () => {
// Create large test dataset
const batchSize = 1000;
const totalRecords = 5000;
for (let i = 0; i < totalRecords; i += batchSize) {
const values = [];
const placeholders = [];
for (let j = 0; j < Math.min(batchSize, totalRecords - i); j++) {
const userIndex = i + j + 1;
values.push(`user${userIndex}@test.com`, `User ${userIndex}`, 'active');
placeholders.push(`($${j * 3 + 1}, $${j * 3 + 2}, $${j * 3 + 3})`);
}
await db.query(`
INSERT INTO users (email, name, status)
VALUES ${placeholders.join(', ')}
`, values);
}
// Run migration
const startTime = Date.now();
await migration.migrate();
const duration = Date.now() - startTime;
// Validate
const [userCount] = await db.query('SELECT COUNT(*) as count FROM users_new');
expect(userCount.count).toBe(totalRecords);
// Performance expectation (should complete within reasonable time)
expect(duration).toBeLessThan(30000); // 30 seconds
});
test('should handle migration resume after interruption', async () => {
// Create test data
for (let i = 1; i <= 1000; i++) {
await db.query(`
INSERT INTO users (email, name, status) VALUES ($1, $2, $3)
`, [`user${i}@test.com`, `User ${i}`, 'active']);
}
// Simulate partial migration
await db.query(`
INSERT INTO migration_progress (migration_name, last_migrated_id)
VALUES ('users_to_users_new', 500)
ON CONFLICT (migration_name)
DO UPDATE SET last_migrated_id = EXCLUDED.last_migrated_id
`);
// Migrate first 500 records manually to simulate previous run
await db.query(`
INSERT INTO users_new (id, email, name, status, preferences, created_at, updated_at)
SELECT id, email, name, status, '{}'::jsonb, created_at, updated_at
FROM users WHERE id <= 500
`);
// Resume migration
await migration.migrate();
// Validate all records are migrated
const [totalCount] = await db.query('SELECT COUNT(*) as count FROM users_new');
expect(totalCount.count).toBe(1000);
// Validate no duplicates
const [duplicateCount] = await db.query(`
SELECT COUNT(*) as count FROM (
SELECT id FROM users_new GROUP BY id HAVING COUNT(*) > 1
) duplicates
`);
expect(duplicateCount.count).toBe(0);
});
test('should rollback on validation failure', async () => {
// Create invalid test scenario
await db.query(`
INSERT INTO users (email, name, status) VALUES
('user1@test.com', 'User One', 'active'),
('user2@test.com', 'User Two', 'active')
`);
// Manually corrupt the new table to trigger validation failure
await db.query(`
INSERT INTO users_new (id, email, name, status, preferences, created_at, updated_at)
VALUES (1, 'wrong@test.com', 'Wrong User', 'active', '{}', NOW(), NOW())
`);
// Migration should detect the corruption
const isValid = await validator.validateComplete();
expect(isValid).toBe(false);
});
});
Real-World Lessons Learned
Lesson 1: Always Test with Production-Like Data
One migration that looked perfect in staging failed spectacularly in production because our test data didn't include the edge cases present in real user data. A few users had null bytes in their names, which caused the migration to crash.
Solution: Always use anonymized production data for migration testing:
-- Create anonymized test data
CREATE TABLE users_test AS
SELECT
id,
'user' || id || '@test.com' as email,
'Test User ' || id as name,
created_at,
updated_at,
status,
-- Keep data structures but anonymize content
CASE
WHEN preferences IS NOT NULL THEN '{"theme": "light"}'::jsonb
ELSE NULL
END as preferences
FROM users
WHERE id % 100 = 0; -- Every 100th user for representative sample
Lesson 2: Monitor Everything
During a major migration, we discovered that our application was making 10x more database queries than expected because we forgot to update a poorly optimized query. The migration technically worked, but brought down the application.
Solution: Set up comprehensive monitoring before starting:
// Migration monitoring dashboard
const migrationMetrics = {
// Database metrics
recordsPerSecond: await calculateMigrationRate(),
errorRate: await getErrorRate(),
connectionCount: await getConnectionCount(),
// Application metrics
apiLatency: await getAverageApiLatency(),
errorResponses: await getErrorResponses(),
userExperience: await getUserExperienceMetrics(),
// System metrics
cpuUsage: await getCPUUsage(),
memoryUsage: await getMemoryUsage(),
diskIO: await getDiskIOMetrics()
};
Lesson 3: Plan for the Worst-Case Scenario
I once had to roll back a migration that was 90% complete because we discovered data corruption in the new table. Having a robust rollback plan saved us from a complete disaster.
Rollback Strategy Template:
-- Immediate rollback plan
BEGIN;
-- Step 1: Stop all writes to new table
UPDATE application_config SET maintenance_mode = true;
-- Step 2: Switch application back to old table
UPDATE application_config SET active_database_table = 'users';
-- Step 3: Clean up corrupted data if needed
-- DELETE FROM users_new WHERE corruption_condition;
-- Step 4: Reset migration progress
UPDATE migration_progress SET last_migrated_id = 0 WHERE migration_name = 'users_to_users_new';
COMMIT;
Lesson 4: Communication is Critical
Keep all stakeholders informed throughout the migration:
// Migration status reporter
class MigrationReporter {
async sendStatusUpdate(phase: string, progress: number, eta?: Date) {
const message = {
phase,
progress: `${progress.toFixed(2)}%`,
eta: eta?.toISOString(),
timestamp: new Date().toISOString()
};
// Notify different stakeholders appropriately
await this.notifyEngineering(message);
await this.notifyBusinessStakeholders(message);
await this.updateStatusPage(message);
}
async notifyEngineering(message: any) {
// Detailed technical updates to engineering Slack
}
async notifyBusinessStakeholders(message: any) {
// High-level updates to business stakeholders
}
async updateStatusPage(message: any) {
// Public status page updates if needed
}
}
Migration Checklist
Use this checklist for every production migration:
Pre-Migration
- [ ] Full database backup created and verified
- [ ] Migration tested on production-like data
- [ ] Performance impact assessed
- [ ] Rollback plan documented and tested
- [ ] Monitoring and alerting configured
- [ ] Stakeholder communication plan ready
- [ ] Maintenance window scheduled (if needed)
During Migration
- [ ] Monitor key metrics continuously
- [ ] Validate data integrity in batches
- [ ] Check application performance
- [ ] Monitor error logs
- [ ] Keep stakeholders updated
Post-Migration
- [ ] Comprehensive data validation
- [ ] Performance benchmarking
- [ ] Monitor for 24-48 hours
- [ ] Document lessons learned
- [ ] Clean up old resources (after validation period)
Database migrations will always carry risk, but with proper planning, testing, and monitoring, you can minimize that risk significantly. The key is to assume something will go wrong and prepare accordingly. Every migration teaches you something new about your system—embrace those lessons and build them into your next migration strategy.
Remember: it's better to take extra time for preparation than to explain a data loss incident to your CEO.