Backfill and Rebuild

PHP Event Sourcing Projection Backfill and Rebuild

The Problem

You deployed a new "order analytics" projection to production, but it only processes events from now on. You have 2 years of order history sitting in the event store. How do you populate the projection with historical data? And later, when you fix a bug in the projection logic, how do you replay everything?

Choosing the Right Strategy

Before reaching for rebuild, consider the lighter alternatives. The cheapest fix is the one that doesn't require replaying any history at all.

Situation
Strategy
Cost

Adding a column where historical rows can use a default value

Default value migration (no replay)

Five-minute deploy

Adding a brand-new projection that has no data yet

Backfill

One pass over history

Fixing a bug in a small projection where downtime during rebuild is acceptable

Rebuild (in-place)

Read model empty during rebuild

Fixing a bug in a large or user-facing projection

v1 keeps serving while v2 catches up

The No-Rebuild Tactic: Default Values

If you are adding a new column to a projection, the first conversation to have is with Product, not with ops: can historical rows use a default value?

Often yes. "We're adding a priority column — historical tickets without explicit priority can show as normal" is a five-minute deploy: extend #[ProjectionInitialization] with an idempotent migration, and the new handler computes the real value for events from now on.

#[ProjectionInitialization]
public function init(): void
{
    $this->connection->executeStatement(<<<SQL
        CREATE TABLE IF NOT EXISTS ticket_list (
            ticket_id VARCHAR(36) PRIMARY KEY,
            ticket_type VARCHAR(25),
            status VARCHAR(25)
        )
    SQL);

    $this->connection->executeStatement(<<<SQL
        ALTER TABLE ticket_list
        ADD COLUMN IF NOT EXISTS priority VARCHAR(25) NOT NULL DEFAULT 'normal'
    SQL);
}

#[ProjectionInitialization] re-runs on every deploy, and IF NOT EXISTS keeps both statements idempotent. Historical rows get 'normal'; new tickets get their real priority from the updated handler. No rebuild, no backfill, no downtime.

This will not always work — sometimes you genuinely need to recompute historical rows. But when it does, it skips the entire backfill/rebuild discussion.

Backfill — Populating a New Projection

Backfill processes all historical events from position 0 to the current position. It's used when you deploy a fresh projection and need to populate it with past data.

Sync Backfill

Add #[ProjectionBackfill] to your projection and run the CLI command:

Then run:

The backfill reads all events from the beginning of the stream, processing them in configurable batches. After backfill completes, the projection is caught up and will process new events as they arrive.

Async Backfill (Enterprise)

For large event stores with millions of events, synchronous backfill may take too long — it runs in the CLI process and blocks until all events are processed. By setting asyncChannelName, the backfill command instead dispatches messages to a channel, turning the backfill into an asynchronous background process:

Run the backfill command (dispatches messages instantly), then start workers to process them:

Scaling Async Backfill with Partitioned Projections

The real power of async backfill comes when combined with #[Partitioned]. Each partition (aggregate) can be backfilled independently, so the work is split into batches that multiple workers process in parallel:

When you run the backfill command with 10,000 aggregates and backfillPartitionBatchSize: 100:

  1. Ecotone dispatches 100 messages to backfill_channel (10,000 / 100)

  2. Each message backfills 100 partitions

  3. Start 4 workers → 4 batches processed in parallel → 4x faster

  4. Start 10 workers → 10x faster

Async backfill is available as part of Ecotone Enterprise.

Rebuild — Reset and Replay (Enterprise)

Rebuild is different from backfill: it resets an existing projection (clears data and position) and then replays all events from the beginning.

Use rebuild when:

  • You fixed a bug in a handler and the Read Model has incorrect data

  • You changed the projection's schema and need to reprocess everything

  • You want to add a new event handler to an existing projection and apply it retroactively

Rebuild is available as part of Ecotone Enterprise.

How rebuild works depends on the projection type — and the difference is significant.

Rebuilding a Global Projection

For a globally tracked projection, rebuild works as reset + backfill on the entire dataset:

  1. #[ProjectionReset] is called — clears all data (e.g., DELETE FROM ticket_list)

  2. Position is reset to the beginning

  3. All events in the stream are replayed through the handlers

Rebuilding a Partitioned Projection

For partitioned projections, rebuild is much safer. Instead of resetting the entire projection at once, Ecotone rebuilds each partition (aggregate) separately:

  1. For each partition: within a transaction, delete that partition's projected data and re-project it

  2. Other partitions are unaffected — they continue serving reads normally

  3. Only one aggregate's data is unavailable at a time, and only briefly

Notice the key difference: #[ProjectionReset] receives #[PartitionAggregateId] — it only deletes the data for the specific aggregate being rebuilt, not the entire table.

Controlling Rebuild Batch Size

The partitionBatchSize parameter controls how many partitions are processed per rebuild command:

With 1000 aggregates and partitionBatchSize: 50, Ecotone dispatches 20 rebuild commands — each processing 50 partitions.

Scaling Rebuild with Async Workers

For large projections, you can distribute rebuild work across multiple workers:

When you run ecotone:projection:rebuild ticket_details:

  1. Ecotone counts the partitions (e.g., 1000 aggregates)

  2. Divides them into batches of 50 → 20 messages

  3. Sends all 20 messages to rebuild_channel

  4. Multiple workers consume from rebuild_channel in parallel

  5. Each worker rebuilds its batch of 50 partitions independently

This means you can rebuild a projection with millions of aggregates by simply scaling up your worker count. Just like with async backfill, throughput scales linearly with the number of workers.

Run the rebuild command, then start workers:

Sync Rebuild

Without asyncChannelName, rebuild runs synchronously — all partitions are processed in the current process:

Backfill vs Rebuild

Backfill
Rebuild (Global)
Rebuild (Partitioned)

Purpose

Populate a new, empty projection

Fix existing projection

Fix existing projection

Starting state

Fresh (no data)

All data cleared first

Per-partition data cleared

Calls reset?

No

Yes — entire table

Yes — per aggregate

Impact during run

None (table is new)

Table empty until done

Only one aggregate briefly affected

Parallel workers?

Via async backfill

Via async channel

Via async channel + partition batches

When to use

First deployment

Bug fix (simple projections)

Bug fix (production, at scale)

Open source?

Yes (sync)

Enterprise

Enterprise

Last updated

Was this helpful?