Skip to content

cqrs

2 posts with the tag “cqrs”

Should Projectors Serve Data?

Yes. For most systems, projectors should serve their own data. Run the projector and its gRPC query endpoint in the same pod. Split them when scaling demands it—not before.

This applies more broadly: collocate components that don’t yet need separation, define interfaces as if they were separate, and split when reality demands it. Angzarr will soon apply this same principle to sagas, allowing them to run directly inside aggregate command handlers—with a clean extraction path when they outgrow it.

Put the projector, read store, and gRPC query service in one pod:

graph TD
    subgraph Pod
        P[Projector<br/>event consumer] -->|writes| RS[(Read Store)]
        RS -->|reads| G[gRPC Service<br/>query endpoint]
    end

When query load overwhelms the pod, or projection lag degrades query latency, or you need to scale reads independently—pull the gRPC service into its own pod:

graph TD
    subgraph Projector Pod
        P[Projector<br/>event consumer] -->|writes| RS[(Read Store)]
    end
    subgraph Query Service Pod
        G[gRPC Service<br/>query endpoint] -->|reads| RS
    end

The interface doesn’t change. Clients don’t know the difference. You’ve scaled without redesigning.

The gRPC service has its own interface definition from day one. The read store already sits between the projector and the query logic. Splitting them apart is a deployment change, not an architecture change—you’re moving a process boundary, not redesigning a system.

Separating the projector from the query service in a low-traffic system buys you an extra pod to deploy, monitor, and debug; network hops you didn’t need; a coordination problem when the read schema changes; and complexity that exists to solve a scaling problem you don’t have.

This is a tradeoff of correctness versus complexity. Complexity reduction should generally win, as long as it can be corrected when it becomes important. The “correct” architecture solves real problems—independent scaling, isolation of projection lag, read model rebuilds without serving impact—but most systems don’t have those problems yet.

Angzarr will likely soon support incorporating sagas directly into aggregate roots and command handlers. Same motivation: for simple sagas tightly coupled to aggregate logic and not under independent load pressure, a separate saga pod is overhead without benefit. The aggregate handles the command, emits events, and performs the coordination—all in one place.

The constraint is identical: it must be easy to peel back out. When the saga becomes complex, when its scaling needs diverge, when a different team needs to own it—extraction should be straightforward. The saga’s interface is already defined. Its coordination logic is already encapsulated. Moving it to its own process is a deployment decision, not a rewrite.

Not everything should start collocated. Split immediately when:

  • Load profiles are already divergent. Hundreds of events per second into the projector, millions of queries per second out—these need independent scaling from the start.
  • Different teams own the read and write paths. Conway’s Law applies. Shared pods across team boundaries create deployment coupling.
  • The read model serves latency-critical paths. If projection rebuilds can’t impact query latency, process isolation is a correctness requirement, not an optimization.
  • Compliance or security boundaries require it. Some read models serve sensitive data through restricted endpoints where process isolation is policy.

These are conditions you can evaluate at design time. If none apply, start simple.

  1. Start simple. Collocate components that don’t yet need separation.
  2. Define interfaces as if they were separate. gRPC services, saga protocols, clear boundaries in code.
  3. Split when the pressure appears. Scaling bottlenecks, team ownership changes, reliability requirements.
  4. The split is mechanical, not architectural. Because the interfaces already exist.

Build for the system you have. Design interfaces for the system you might need. Deploy the simplest topology that works.


This post is part of an ongoing series on pragmatic architecture decisions in event-sourced systems. The opinions are informed by building Angzarr and deploying it in production—where elegance matters less than operability.

Angzarr Core 0.3.0: Facts Over State, Edition Branching, and 18K Lines Deleted

Angzarr Core 0.3.0 ships today with a fundamental shift in how sagas and process managers handle cross-aggregate coordination: they now receive sequence numbers instead of full event books. This “facts over state rebuilding” change aligns coordinators with the framework’s core philosophy. The release also adds explicit divergence support for edition branching, enabling counterfactual “what-if” scenarios at any point in an aggregate’s timeline.

Saga and Process Manager Protocol Update. Handlers now receive destination_sequences—a map of domain to next sequence number—instead of full EventBook state. This is a breaking proto change. Coordinators stamp commands with sequences and let aggregates decide; they no longer rebuild destination state themselves.

Edition Branching with Explicit Divergence. New branches can now specify an exact divergence point from the main timeline. The storage layer reads events from main up to sequence N, returning them as base state for the new branch. Use case: “What if I had folded at sequence 3 instead of calling?”

Cascade Two-Phase Commit. Merged from the core-cascade-improvements branch, adding cascade_id and committed fields to EventPage, stale cascade cleanup via CascadeReaper, and conflict detection for distributed transactions.

OpenTelemetry 0.31 and Tonic 0.14. Updated to latest observability and gRPC stacks with breaking API changes handled throughout.

Security Fixes. Critical gRPC-Go vulnerabilities patched in the gateway (v1.70.0 → v1.79.3), plus high-severity fixes in AWS-LC and quinn-proto.

18,000 Lines Removed. Standalone mode deleted entirely. The framework now exclusively uses the distributed coordinator architecture.

The proto changes require regenerating client code:

// Before (0.2.x)
message SagaHandleRequest {
repeated EventBook destinations = 4;
}
// After (0.3.0)
message SagaHandleRequest {
map<string, uint64> destination_sequences = 4;
}

Same pattern for ProcessManagerHandleRequest.destination_sequences.

Handlers that previously iterated over destination event books to determine state must now use sequences directly. The philosophy: coordinators deal in facts (sequences), not state reconstruction.

The previous design had sagas receiving full event books for destination aggregates. This created several problems:

  1. Unnecessary coupling. Sagas knew how to interpret destination domain events.
  2. Performance overhead. Loading event history for every coordination step.
  3. Philosophy violation. Sagas are coordinators, not domain experts.

The new design treats sequences as facts. A saga knows “Player aggregate is at sequence 7” without knowing what happened in sequences 1-6. It stamps the outbound command with sequence 7, and the aggregate validates whether that sequence is still current.

If the sequence has advanced (concurrent modification), the aggregate rejects the command. The saga retries with fresh sequences. No domain logic leaked into the coordinator.

Editions enable counterfactual reasoning within event-sourced systems. The new explicit divergence support makes this practical:

// Create branch diverging at sequence 3
let edition = Edition {
domain_divergence: Some(DomainDivergence {
sequence: 3,
// Branch sees events 1-3 from main, then diverges
}),
..
};

Key implementation details:

  • EventStore::get_with_divergence() reads main timeline up to the divergence point
  • Snapshots accelerate loading state up to the divergence point
  • The aggregate applies events 1-3, then processes new commands on the branch

Use cases include game replay analysis, regulatory “what-if” scenarios, and training data generation from production event streams.

Two-phase commit for cross-aggregate operations is now first-class:

  • CascadeReaper cleans up stale cascades that failed mid-transaction
  • Conflict detection identifies when concurrent cascades touch the same aggregates
  • Query methods (query_stale_cascades, query_cascade_participants) added to all storage backends

This work also drove mutation testing improvements, pushing mock EventStore coverage from 62.9% to 100% kill rate.

  1. Regenerate protos using Buf or your preferred tooling
  2. Update saga/PM handlers to use destination_sequences map
  3. Replace EventBook iteration with sequence stamping
  4. Test with explicit divergence if using editions

The cascade changes are additive—no migration required unless you’re adopting 2PC.

With standalone mode removed, the framework is fully committed to the distributed architecture. Upcoming work includes:

  • Multi-language client SDK stabilization (Go, Python, Rust, Java)
  • Snapshot retention policies (migration added in 0.3.0)
  • Enhanced cascade conflict resolution strategies

Full changelog: e1335ddc…908f9aad