2026-03-12 | PreviewProof Team

Database Migrations and Seed Data for Ephemeral Preview Environments

database migrationsseed datapreview environmentsPostgreSQLPrismaephemeral environments

An ephemeral preview environment with an empty database is almost useless. The login page works, but there are no users to log in as. The dashboard renders, but every list is empty. The feature you’re trying to test requires three related records that don’t exist yet. The reviewer clicks around, sees blank screens, and closes the tab. Getting the database into a useful state on every environment boot — schema applied, realistic data loaded, ready to test — is the part of preview environments that takes the most thought and the most iteration to get right.

Migrations Need to Run Automatically and Idempotently

In a long-lived staging environment, you run migrations once and move on. In an ephemeral environment, the database starts from scratch every time the environment spins up. Migrations need to run on boot, without manual intervention, and without failing if they’ve already been applied.

Most migration tools handle this well. The pattern is the same regardless of the tool — run the deploy/apply command in your container’s entrypoint before starting the application server.

# Node/Prisma entrypoint
#!/bin/sh
set -e
npx prisma migrate deploy
node server.js

# Python/Alembic entrypoint
#!/bin/sh
set -e
alembic upgrade head
gunicorn app:create_app()

Prisma’s migrate deploy applies pending migrations and skips ones already recorded in the _prisma_migrations table. Alembic’s upgrade head does the same against its alembic_version table. The same pattern applies across ecosystems — Drizzle, TypeORM, Sequelize, golang-migrate, Goose, Flyway, and Liquibase all provide a non-interactive command that applies pending migrations and skips already-applied ones. The tool matters less than the constraint: migrations must be non-interactive and idempotent. If the environment restarts or the init process runs twice, nothing should break.

One common mistake with Prisma specifically: running prisma migrate dev in ephemeral environments instead of prisma migrate deploy. The dev command diffs your schema against the database and tries to generate new migrations — useful locally, dangerous in CI or preview environments where it can drop and recreate tables. Alembic has a similar footgun — alembic revision --autogenerate should never run in an automated environment. Always use the apply/deploy command, never the generate command.

Seed Data Strategies: Trade-Offs Between Speed and Realism

Once the schema is applied, you need data. There are three common approaches, each with different tradeoffs.

Scripted seed files. A script that inserts a fixed set of records — a few users, some sample projects, enough data to make the UI functional. This is the simplest approach and the easiest to maintain.

import { PrismaClient } from '@prisma/client'
import { hash } from 'bcrypt'

const prisma = new PrismaClient()

async function main() {
  const password = await hash('preview123', 10)

  const alice = await prisma.user.create({
    data: {
      email: '[email protected]',
      name: 'Alice Chen',
      passwordHash: password,
      role: 'ADMIN',
    },
  })

  const bob = await prisma.user.create({
    data: {
      email: '[email protected]',
      name: 'Bob Martinez',
      passwordHash: password,
      role: 'MEMBER',
    },
  })

  const project = await prisma.project.create({
    data: {
      name: 'Acme Web App',
      ownerId: alice.id,
      members: { connect: [{ id: alice.id }, { id: bob.id }] },
    },
  })

  // Seed enough history to make dashboards useful
  await prisma.deployment.createMany({
    data: [
      { projectId: project.id, status: 'SUCCESS', branch: 'main',
        createdAt: daysAgo(3) },
      { projectId: project.id, status: 'SUCCESS', branch: 'feat/auth',
        createdAt: daysAgo(1) },
      { projectId: project.id, status: 'FAILED', branch: 'fix/timeout',
        createdAt: hoursAgo(2) },
    ],
  })
}

function daysAgo(n: number) {
  return new Date(Date.now() - n * 86400000)
}
function hoursAgo(n: number) {
  return new Date(Date.now() - n * 3600000)
}

main()

Scripted seeds are deterministic — every preview environment gets the same data, which makes testing predictable. The downside is maintenance. Every schema change that touches seeded tables requires updating the seed script, and the data tends to be sparse.

Faker-generated datasets. A script that uses a library like @faker-js/faker to generate a larger, more realistic dataset. This produces previews that look and feel closer to production.

import { PrismaClient } from '@prisma/client'
import { faker } from '@faker-js/faker'
import { hash } from 'bcrypt'

const prisma = new PrismaClient()

// Seed the PRNG so every preview gets identical data
faker.seed(42)

async function main() {
  const password = await hash('preview123', 10)

  // Always create known test accounts first
  const admin = await prisma.user.create({
    data: {
      email: '[email protected]',
      name: 'Test Admin',
      passwordHash: password,
      role: 'ADMIN',
    },
  })

  // Generate realistic-looking users
  const users = await Promise.all(
    Array.from({ length: 20 }, () =>
      prisma.user.create({
        data: {
          email: faker.internet.email(),
          name: faker.person.fullName(),
          passwordHash: password,
          role: faker.helpers.arrayElement(['MEMBER', 'MEMBER', 'VIEWER']),
          createdAt: faker.date.past({ years: 1 }),
        },
      })
    )
  )

  const allUsers = [admin, ...users]

  // Generate projects with realistic ownership and membership
  for (let i = 0; i < 5; i++) {
    const owner = faker.helpers.arrayElement(allUsers)
    const memberPool = allUsers.filter((u) => u.id !== owner.id)
    const members = faker.helpers.arrayElements(memberPool, { min: 2, max: 6 })

    const project = await prisma.project.create({
      data: {
        name: `${faker.company.buzzAdjective()} ${faker.company.buzzNoun()}`,
        ownerId: owner.id,
        members: { connect: members.map((m) => ({ id: m.id })) },
      },
    })

    // Generate deployment history so dashboards aren't empty
    await prisma.deployment.createMany({
      data: Array.from({ length: faker.number.int({ min: 5, max: 20 }) },
        () => ({
          projectId: project.id,
          status: faker.helpers.weightedArrayElement([
            { value: 'SUCCESS', weight: 7 },
            { value: 'FAILED', weight: 2 },
            { value: 'RUNNING', weight: 1 },
          ]),
          branch: faker.git.branch(),
          createdAt: faker.date.recent({ days: 30 }),
        })
      ),
    })
  }
}

main()

The faker.seed(42) call is important. Without a fixed seed, every preview environment generates different data, which makes bug reports harder to reproduce — “I saw this on the preview” means nothing if the data is different on every boot. A fixed seed gives you the volume and variety of faker data with the determinism of scripted seeds.

Prebuilt database images. Instead of running migrations and seeds at boot time, you bake the fully seeded database into a Docker image during CI. The image includes the PostgreSQL data directory with migrations applied and seed data loaded. Every preview environment pulls the same image and starts with an identical, ready-to-use database in seconds.

# db/Dockerfile
FROM postgres:16 AS build

ENV POSTGRES_USER=postgres
ENV POSTGRES_DB=app
ENV POSTGRES_HOST_AUTH_METHOD=trust

COPY migrations/ /docker-entrypoint-initdb.d/01-migrations/
COPY seed.sql /docker-entrypoint-initdb.d/02-seed.sql

# Boot postgres, let it run the init scripts, then stop cleanly
RUN docker-entrypoint.sh postgres & \
    until pg_isready -U postgres; do sleep 1; done && \
    sleep 5 && \
    pg_ctl -D /var/lib/postgresql/data stop -m fast

# Final image starts with a pre-loaded data directory
FROM postgres:16
COPY --from=build /var/lib/postgresql/data /var/lib/postgresql/data

This is the same pattern many teams already use for local development with Docker Compose — everyone pulls the same prebuilt database image so docker compose up gives you a working environment immediately. Extending it to preview environments means your CI builds the image on schema or seed changes and pushes it to your registry:

# In CI — only rebuild when db files change
- name: Build seeded database image
  run: |
    docker build -t ghcr.io/${{ github.repository }}/db:${{ github.sha }} ./db
    docker push ghcr.io/${{ github.repository }}/db:${{ github.sha }}

The advantage over runtime seeding is boot speed — the database is ready the moment the container starts, no migration or seed scripts to run. The advantage over snapshot restores is reproducibility — the image is built from your migration and seed files in version control, not from a dump that may have drifted. The tradeoff is an extra CI step, but teams that do this typically gate the rebuild — only trigger it when files in migrations/ or seed.sql change, and reuse the cached image otherwise.

Which Tables to Seed and Which to Leave Empty

Not every table needs data. The goal is to seed enough that the reviewer can navigate the application and test the feature under review without manual data entry.

Always seed: Users and authentication (so reviewers can log in), any entity that appears in navigation or dashboards (projects, organizations, workspaces), and lookup tables (roles, statuses, categories).

Seed with history: Tables that drive charts, timelines, or activity feeds. An empty activity feed tells the reviewer nothing. A feed with 30 days of realistic entries shows whether the feature works in context.

Leave empty: Tables that the feature under test is supposed to create. If the PR adds an invoicing feature, don’t seed invoices — let the reviewer test the creation flow. Seed the prerequisites (customers, products, pricing) but not the output.

Never seed with real data. Production database dumps — even “sanitized” ones — are a liability. Email addresses, names, and behavioral patterns in production data can identify real users even with passwords stripped. Use faker for everything. It’s not significantly more work, and it eliminates an entire class of compliance and privacy risk.

The seed script is code that needs maintenance, the same as any other part of your application. When you add a required column to the users table, the seed script needs to include it. When you add a new entity that appears on the dashboard, the seed script should generate instances of it. Treat seed scripts as part of your schema change workflow — update them in the same PR that changes the schema.

PreviewProof runs your seeded containers as part of the preview environment, so the database your reviewers see is the one your seed script produces — with structured testing checklists attached so reviewers know which flows to verify against the seeded data.

Ephemeral environments are only as useful as the data they start with. An empty database produces an empty demo. A thoughtfully seeded database produces a testable preview.