2025-12-30 | PreviewProof Team

Background Jobs in Ephemeral Preview Environments: Patterns and Pitfalls

background jobspreview environmentsSidekiqCeleryBullMQephemeral environments

Every backend stack has a job runner. Rails has Sidekiq and GoodJob. Python has Celery, RQ, and Dramatiq. Node has BullMQ. .NET has Hangfire. Elixir has Oban. They were all designed with one assumption that ephemeral preview environments break: that the worker process lives a long time on stable infrastructure, with persistent connections to a queue backend that’s been there since before the worker started and will be there after.

Previews violate every part of that. A preview spins up when a PR opens, runs for hours or days, and gets destroyed when the PR merges or closes. The queue and the worker come and go with it. Most teams either disable background jobs entirely in previews — and then can’t test the half of the application that uses them — or pretend the problems don’t exist and ship bugs that only manifest under realistic load.

There are patterns that work. Most of them involve being explicit about what jobs are for in a preview, which is different from what they’re for in production.

What goes wrong by default

Scheduled jobs run at the wrong times. Your whenever schedule says “run the daily billing rollup at 02:00 UTC.” In production, fine. In preview, the same configuration loads by default — so at 02:00 UTC every preview fires the billing rollup. Against test data. Possibly making real Stripe API calls.

Workers don’t shut down cleanly. A preview is destroyed by SIGTERMing its containers. A Sidekiq worker halfway through a job may leave it in processing state in Redis. If you’re sharing Redis across previews, the orphaned job persists and confuses the next worker.

Jobs depend on long-lived state. “Schedule a follow-up job 24 hours from now to send a reminder email.” The schedule is enqueued, the preview is destroyed in two hours, the reminder fires against nothing — or against shared infrastructure.

End-to-end flows are untestable. The user clicks “export my data,” the controller enqueues an export job, and the user is supposed to receive an email with a download link. In preview, the controller works, the job enqueues, and either it never runs (because the worker is disabled) or it runs against test infrastructure that doesn’t have the email provider configured. The feature is unreviewable.

Pattern 1: One queue per preview

Hard isolation. Each preview gets its own queue backend — a dedicated Redis instance, a dedicated SQS queue, a dedicated Postgres database for the queue tables. Not “the same Redis with a key prefix.” A separate instance.

The cost is a small fraction of what a single contamination bug costs in debugging time. See Per-Preview Database vs. Shared Dev Database for the same argument applied to the primary data store.

Sidekiq.configure_server do |config|
  config.redis = { url: ENV.fetch("SIDEKIQ_REDIS_URL") }
end

Sidekiq.configure_client do |config|
  config.redis = { url: ENV.fetch("SIDEKIQ_REDIS_URL") }
end

When the preview is destroyed, the queue goes with it. No orphaned jobs.

Pattern 2: Suppress scheduled jobs in previews

Cron-style schedules should not fire in preview environments by default. Whatever scheduling library you use, gate it on an environment flag.

# Celery beat — Django settings
CELERY_BEAT_SCHEDULE = {} if os.environ.get("PREVIEW_ENV") else {
    "daily-billing-rollup": {
        "task": "billing.tasks.daily_rollup",
        "schedule": crontab(hour=2, minute=0),
    },
}

Same pattern in sidekiq-cron, node-cron, Hangfire’s RecurringJob, and Oban’s Cron plugin. If a specific scheduled job is part of what you’re testing, the PR can opt it in. The default needs to be off — one accidental cron run against shared infrastructure costs more than the inconvenience of opting in.

Pattern 3: Time-bound job execution

Jobs that take longer than a preview’s lifetime are a different problem. Three reasonable responses.

Give workers a hard timeout shorter than the preview’s idle-shutdown timer. If your preview shuts down after 4 hours of inactivity, kill any job longer than 30 minutes. Failing loudly beats dying silently mid-execution.

Mark jobs as preview-aware and run an abbreviated version. A “reindex all documents” job that takes 6 hours in production should run on a 100-document sample in preview. Pass a PREVIEW_ENV flag and let it short-circuit.

Or — where most teams should land — redesign jobs that aren’t testable in preview-friendly time bounds. If the only way to verify a job works is to run it for 6 hours, you’ve built something untestable. Break it up.

Pattern 4: Synthetic invocation for end-to-end flows

The user-clicks-button → job-runs → user-sees-result flow drives reviewers crazy. Make the job invocation observable and the result inspectable from the preview.

First, expose a debug page in the preview that shows recent job runs, their status, arguments, and results. Sidekiq Web, Hangfire Dashboard, Bull Board, Flower for Celery — all exist for this reason. Make them accessible to reviewers, not just engineers.

Second, when a job’s result is delivered out-of-band (email, webhook, SMS), redirect that delivery to something inspectable inside the preview. MailHog catches email. A webhook receiver running in the preview catches outgoing webhooks. SMS goes to a test mode that surfaces the message in the preview UI. See Stripe, Twilio, and SendGrid in Test Mode for Previews.

Pattern 5: Fast startup, clean shutdown

Workers need to boot quickly so the first review request after deploy doesn’t feel broken, and drain cleanly on SIGTERM.

:timeout: 25       # seconds to wait for jobs on SIGTERM
:concurrency: 5    # smaller than production

Match the SIGTERM grace period to the platform’s shutdown window. Kubernetes defaults to 30 seconds; Heroku is the same. Set the worker’s drain timeout a few seconds shorter so jobs have a chance to finish or be re-enqueued before the container is killed.

What this gets you

Done right, background jobs in previews give reviewers a real version of the asynchronous half of your application. The export button works and produces an actual download link. The billing rollup can be triggered manually and inspected. Reviewers stop saying “I can’t actually test this end-to-end” and start trusting the preview as a complete verification surface.

Done wrong — shared queues, unsuppressed schedules, jobs that depend on hours of runtime — previews become a place where async features are unreviewable, which means they ship without anyone seeing them work.

If wiring all of this up sounds like more ops work than you signed up for, that’s what PreviewProof does. Per-preview Redis, suppressed schedules by default, inspectable job dashboards, SMTP and webhook capture wired in. Or build it yourself — the patterns above are most of what you need.