Skip to content

Testing Strategy

Nothing is “done” until tests prove it works. Writing code without runnable tests is incomplete work.


Test-Driven Development is mandatory. Follow the Red-Green-Refactor cycle:

flowchart LR
    R[🔴 Red] --> G[🟢 Green] --> RF[🔵 Refactor] --> R
PhaseAction
RedWrite a failing test first. Verify it fails for the right reason. Ensure test isolation.
GreenWrite minimal code to pass. No extras, no premature optimization.
RefactorClean up while keeping tests green. Apply SOLID principles. Remove duplication.

Critical: If a test fails, fix the issue and create a new commit. Never amend commits that have been pushed.


flowchart TB
    subgraph Acceptance["Acceptance Tests"]
        A1[Business behavior]
        A2[Full stack via Gherkin]
    end
    subgraph Contract["Contract Tests"]
        C1[Interface compliance]
        C2[Backend interchangeability]
    end
    subgraph Integration["Integration Tests"]
        I1[Framework internals]
        I2[In-process RuntimeBuilder]
    end
    subgraph Unit["Unit Tests"]
        U1[Pure functions]
        U2[No I/O]
    end

    Acceptance --> Contract --> Integration --> Unit
LevelScopeSpeedInfrastructure
UnitSingle function/moduleFast (ms)None
IntegrationFramework plumbingFast (ms)In-process (RuntimeBuilder, SQLite, channels)
ContractInterface complianceMedium (s)Testcontainers
AcceptanceBusiness behaviorMedium (s)In-process channel + SQLite or full stack (direct)

Angzarr uses Cucumber/Gherkin extensively—not just for acceptance tests, but also for contract tests and integration tests. The primary motivation is readability.

Gherkin specifications are easier to read than programmatic tests:

illustrative
# Gherkin: Intent is immediately clear
Scenario: Events persist with correct sequence numbers
Given an empty event store
When I append 3 events to aggregate "player-123"
Then the events have sequences 0, 1, 2
And querying from sequence 1 returns 2 events
illustrative
// Programmatic: Requires understanding test framework idioms
#[tokio::test]
async fn test_events_persist_with_correct_sequence_numbers() {
let store = setup_store().await;
let root = uuid::Uuid::new_v4();
for i in 0..3 {
store.append(&root, make_event(i)).await.unwrap();
}
let events = store.get_from(&root, 1).await.unwrap();
assert_eq!(events.len(), 2);
assert_eq!(events[0].sequence, 1);
}

Both test the same thing, but the Gherkin version:

  • Documents behavior in plain English
  • Serves as living specification
  • Is reviewable by non-developers
  • Makes test coverage gaps obvious
ComponentTest TypeHarnessWhy
Core (src/, tests/)Contract, IntegrationRust cucumber-rsReadable specs for framework behavior
Clients (client/{lang}/)ContractUnified Rust gRPC harnessOne source of truth across 6 languages
Examples (examples/{lang}/)AcceptancePer-language harnessesDemonstrative for developers

Core framework tests use Gherkin for readability, not cross-language consistency:

illustrative - directory structure
tests/
├── interfaces/
│ ├── features/
│ │ ├── event_store.feature # EventStore contract
│ │ ├── snapshot_store.feature # SnapshotStore contract
│ │ ├── position_store.feature # PositionStore contract
│ │ ├── event_bus.feature # EventBus contract
│ │ └── dlq.feature # DLQ behavior
│ └── steps/ # Rust step definitions
├── client/
│ ├── features/
│ │ ├── aggregate-client.feature # Client SDK contracts
│ │ ├── command-builder.feature
│ │ └── query-client.feature
│ └── (Rust harness calls clients via gRPC)
└── acceptance/
└── features/
└── end_to_end.feature # Full stack scenarios

Client libraries are tested with a unified Rust gRPC harness:

flowchart TB
    H["Rust Gherkin Harness (cucumber-rs)<br/>- Step definitions: tests/client/<br/>- Feature files: client/features/*.feature"]
    H -->|gRPC| Py[Python Client]
    H -->|gRPC| Go[Go Client]
    H -->|gRPC| Rs[Rust Client]
    H -->|gRPC| Ja[Java Client]
    H -->|gRPC| Cs[C# Client]
    H -->|gRPC| Cpp[C++ Client]

Why unified?

  • One source of truth for SDK contracts
  • Same tests validate all 6 language implementations
  • Tests actual gRPC protocol, not internal APIs
client/features/aggregate_client.feature
# docs:start:aggregate_client_contract
Feature: AggregateClient - Command Execution
The AggregateClient sends commands to aggregates for processing.
Commands are validated, processed, and result in events being persisted.
Supports async (fire-and-forget), sync, and speculative modes.
Without command execution, the system cannot accept user actions or
change aggregate state.
# docs:end:aggregate_client_contract
# docs:start:client_command
Scenario: Execute command on new aggregate
Given a new aggregate root in domain "orders"
When I execute a "CreateOrder" command with data "customer-123"
Then the command should succeed
And the response should contain 1 event
And the event should have type "OrderCreated"
Scenario: Execute command on existing aggregate
Given an aggregate "orders" with root "order-001" at sequence 3
When I execute a "AddItem" command at sequence 3
Then the command should succeed
And the response should contain events starting at sequence 3
# docs:end:client_command
# docs:start:client_concurrency
Scenario: Command at wrong sequence fails with precondition error
Given an aggregate "orders" with root "order-002" at sequence 5
When I execute a command at sequence 3
Then the command should fail with precondition error
And the error should indicate sequence mismatch
Scenario: Concurrent writes are detected
Given an aggregate "orders" with root "order-003" at sequence 0
When two commands are sent concurrently at sequence 0
Then one should succeed
And one should fail with precondition error
# docs:end:client_concurrency
client/features/command_builder.feature
# docs:start:command_builder_contract
Feature: CommandBuilder - Fluent Command Construction
The CommandBuilder provides a fluent API for constructing commands.
It handles serialization, correlation IDs, sequence numbers, and
type URLs while providing compile-time and runtime validation.
The builder pattern enables both OO-style client usage and can be
adapted for router-based implementations.
# docs:end:command_builder_contract
Scenario: Build command with all required fields
When I build a command for domain "orders" root "order-001"
And I set the command type to "CreateOrder"
And I set the command payload
Then the built command should have domain "orders"
And the built command should have root "order-001"
And the built command should have type URL containing "CreateOrder"
Scenario: Build with explicit correlation ID
When I build a command for domain "orders"
And I set correlation ID to "trace-123"
And I set the command type and payload
Then the built command should have correlation ID "trace-123"
Scenario: Builder methods can be chained
When I build a command using fluent chaining:
"""
client.command("orders", root)
.with_correlation_id("trace-456")
.with_sequence(3)
.with_command("CreateOrder", payload)
.build()
"""
Then the build should succeed
And all chained values should be preserved
illustrative
just test-client python # Test Python client via Rust harness
just test-client go # Test Go client via Rust harness
just test-clients # Test all clients

Example implementations use per-language Gherkin harnesses:

illustrative - directory structure
examples/features/unit/*.feature (shared specifications)
├── Python: behave + examples/python/features/steps/
├── Go: godog + examples/go/tests/steps/
├── Rust: cucumber-rs + examples/rust/tests/
├── Java: cucumber-junit5 + examples/java/tests/
├── C#: SpecFlow + examples/csharp/Tests/Steps/
└── C++: cucumber-cpp + examples/cpp/tests/

Why per-language?

  • Demonstrative for non-polyglot developers
  • Developers see Gherkin AND step definitions in their language
  • Educational code they can learn from and copy
illustrative
just examples python test # behave
just examples go test # godog
just examples rust test # cucumber-rs

No external dependencies. Tests interact only with the system under test—no I/O, no concurrency, no infrastructure.

Test business logic directly. No mocks. No frameworks. No infrastructure.

Aggregates are classes: instantiate, seed state, call a @handles method, assert on the returned event. Appliers are equally testable — pass a fresh state, call the method, assert on the mutation.

  • Pass in state structs directly
  • Assert on returned events
  • No database connections, no message buses, no HTTP clients

If you’re writing mocks, you’re testing the wrong thing. Business logic should be deterministic methods that take data in and return data out.

illustrative - direct handler test
def test_deposit_increases_bankroll():
agg = Player()
agg.state.registered = True
agg.state.bankroll = 1000
event = agg.handle_deposit_funds(DepositFunds(amount=500))
assert event.new_bankroll == 1500
def test_deposit_rejected_when_not_registered():
agg = Player()
agg.state.registered = False
with pytest.raises(CommandRejectedError):
agg.handle_deposit_funds(DepositFunds(amount=500))

Why this works:

  • @handles methods are ordinary methods — call them directly
  • No mocking required: pass state directly, assert on returned events
  • Proto serialization tested separately from business logic

Use the pattern: test_<action>_<condition>_<expected_result>

LanguageConventionExample
Pythonsnake_casetest_deposit_with_zero_amount_raises_error
Rustsnake_casetest_deposit_with_zero_amount_raises_error
GoCamelCaseTestDepositWithZeroAmountRaisesError
Java/C#camelCasedepositWithZeroAmountRaisesError

Prioritize readability over rigid format. Tests are documentation.

Events cross a serialization boundary between business logic and the framework:

flowchart TB
    subgraph BL["Business Logic"]
        C["compute(cmd, state) → raw event"]
        BS["build_state(state, events)"]
        AE["_apply_event(state, event_any)"]
        MS[mutate state]
    end
    subgraph FW["Framework"]
        AP["Any.Pack(event)"]
        EB["EventBook.pages[].event<br/>(Any-wrapped)"]
        EX[extract events from pages]
        UP["event_any.Unpack(typed_event)"]
    end
    C --> AP
    AP -->|persist to EventBook| EB
    EB --> EX
    EX --> BS
    BS --> AE
    AE --> UP
    UP --> MS

The framework stores events as opaque Any blobs—it doesn’t know business types. Business logic must decode the Any because only it knows PlayerRegistered, FundsDeposited, etc.

Full event sourcing test cycle:

illustrative
def test_deposit_full_cycle():
# 1. Start with state
state = PlayerState(bankroll=100)
cmd = DepositFunds(amount=50)
# 2. compute() produces raw event
event = compute(cmd, state)
# 3. Wrap in Any (what framework does for persistence)
event_any = Any()
event_any.Pack(event, type_url_prefix="type.googleapis.com/")
# 4. build_state applies Any-wrapped events → new state
new_state = build_state(state, [event_any])
assert new_state.bankroll == 150

Tests mimic the production boundary exactly—no special test-only interfaces.


Test Angzarr framework internals using synthetic aggregates (EchoAggregate, MultiEventAggregate). Prove the plumbing works—not business logic.

Infrastructure: In-process only. Uses RuntimeBuilder with SQLite and channel bus. No containers.

What they cover:

  • Event persistence and sequence numbering
  • IPC event bus (named pipes, domain filtering)
  • gRPC over UDS transport
  • Channel bus pub/sub delivery
  • Saga activation and cross-domain command routing
  • Snapshot/recovery, lossy bus resilience
tests/interfaces/features/event_bus.feature
Feature: EventBus interface
The EventBus distributes committed events to interested subscribers. After
an aggregate persists events, the bus broadcasts them to sagas, projectors,
and process managers that need to react.
Background:
Given an EventBus backend
Scenario: Handlers only receive events from their subscribed domain
Given the player-projector subscribes only to the player domain
When events are published to player and table domains
Then the player-projector receives only player events
And never sees table events which are filtered out by the bus
Scenario: Cross-domain handlers can subscribe to multiple domains
Given the output-projector subscribed to player and table domains
When events are published to player, table, and hand domains
Then the output-projector receives player events because it subscribed
And the output-projector receives table events because it subscribed
And the output-projector does NOT receive hand events because it did not subscribe
Scenario: Events arrive in sequence order from a single publisher
Given a single-threaded hand aggregate publishing events
And a projector subscribed to hand
When events with sequences 0, 1, 2, 3, 4 are published in order
Then the projector receives them in sequence order: 0, 1, 2, 3, 4

Verify all storage implementations behave identically. Same interface, interchangeable backends.

Infrastructure: Testcontainers exclusively. This is the only level that uses containers—real Postgres, Redis, NATS, immudb.

What contracts verify:

  • EventStore: add, get, get_from, get_from_to, list_roots, sequence numbering
  • SnapshotStore: save, load, delete
  • PositionStore: get, set, checkpoint semantics

Location: tests/interfaces/

Contract tests are written in Gherkin for readability. The same feature files run against every backend:

tests/interfaces/features/event_store.feature
Feature: EventStore interface
The EventStore is the source of truth for all state changes in the system.
Every aggregate's current state is derived by replaying its events. This
immutability provides a complete audit trail, enables temporal queries, and
allows the system to reconstruct any aggregate's state at any point in history.
Background:
Given an EventStore backend
Scenario: First event in an aggregate's history starts at sequence 0
Given an aggregate "player" with no events
When I add 1 event to the aggregate
Then the aggregate should have 1 event
And the first event should have sequence 0
Scenario: Multiple events from a single command receive consecutive sequences
Given an aggregate "player" with no events
When I add 5 events to the aggregate
Then the aggregate should have 5 events
And events should have consecutive sequences starting from 0
Scenario: Concurrent writers are detected via sequence mismatch
Given an aggregate "player" with 3 events
When I try to add an event with sequence 1
Then the operation should fail with a sequence conflict
Scenario: Stale writers cannot overwrite history
Given an aggregate "player" with 3 events
When I try to add an event with sequence 0
Then the operation should fail with a sequence conflict
illustrative
# Run against SQLite (fast, no containers)
STORAGE_BACKEND=sqlite cargo test --test interfaces --features sqlite
# Run against PostgreSQL (testcontainers)
STORAGE_BACKEND=postgres cargo test --test interfaces --features postgres
# Run against immudb (testcontainers)
STORAGE_BACKEND=immudb cargo test --test interfaces --features immudb

Contract tests ensure you can swap backends without behavior changes. If SQLite passes but Postgres fails, the Postgres implementation has a bug.

Contract tests use testcontainers to provision real databases:

illustrative
async fn start_postgres() -> (ContainerAsync<GenericImage>, String) {
let image = GenericImage::new("postgres", "16")
.with_exposed_port(5432.tcp())
.with_wait_for(WaitFor::message_on_stdout(
"database system is ready",
));
let container = image
.with_env_var("POSTGRES_USER", "testuser")
.with_env_var("POSTGRES_PASSWORD", "testpass")
.with_env_var("POSTGRES_DB", "testdb")
.start()
.await
.expect("Failed to start container");
let host_port = container.get_host_port_ipv4(5432).await.unwrap();
let url = format!("postgres://testuser:testpass@localhost:{}/testdb", host_port);
(container, url)
}

Benefits:

  • Zero setup — tests start containers automatically
  • Isolation — each test gets fresh state
  • Realistic — tests run against real databases, not mocks

Test business behavior through the full stack. Written in Gherkin, describing what the system does from a business perspective.

Location: examples/rust/e2e/tests/features/

Gherkin is business-readable specification, not test code. Describe what the system does and why it matters—never how.

The litmus test: “Will this wording change if the implementation changes?” If yes, abstract to behavior.

illustrative
# Wrong: UI choreography
When I click "Add to Cart"
And I click "Checkout"
And I fill in "Card Number" with "4111..."
# Right: Business intent
When I purchase the items in my cart
KeywordPurposeExample
GivenEstablish context (past state)Given a player with $500 in their bankroll
WhenSingle triggering actionWhen the player reserves $200 for the table
ThenVerify business outcomesThen the player's available balance is $300
Technical (Avoid)Business (Prefer)
API returns 201Order is confirmed
Database has recordCustomer exists
Event is publishedNotification is sent
State machine transitionsHand progresses to showdown

Exception: Framework tests (event stores, buses) use technical vocabulary—it’s their domain.

Each scenario tests exactly one thing. Multiple When-Then pairs = multiple scenarios.

Open features with context explaining what this capability enables, why it matters, and what breaks if it doesn’t work:

illustrative
Feature: Player fund reservation
Players must reserve funds when joining a table. This ensures:
- Players can cover their buy-in before sitting down
- Funds are locked and cannot be double-spent across tables
Without fund reservation, players could join multiple tables with
the same bankroll, creating settlement disputes.

Don’t just test happy paths. Business rules live in constraints:

illustrative
Scenario: Cannot reserve more than available balance
Given Alice has $500 available
When Alice tries to reserve $600
Then the request fails with "insufficient funds"
And Alice's available balance remains $500

Show saga/PM translations explicitly without exposing implementation:

illustrative
Scenario: Order completion triggers fulfillment
Given an order with items:
| sku | quantity |
| WIDGET | 3 |
When the order is completed
Then a fulfillment request is created with:
| sku | quantity |
| WIDGET | 3 |

The same Gherkin scenarios validate all language implementations. angzarr-project/features/acceptance/ is the canonical source; each example repo pulls it in via the project submodule.

canonical feature layout
angzarr-project/features/acceptance/
├── poker_game.feature # End-to-end poker flow
└── sync_modes.feature # Commit vs cascade semantics

Each of the six language repos (Python, Rust, Go, Java, C#, C++) implements 169 step definitions against the same features — same behavior, same assertions, different runtimes.

illustrative
# Python uses behave
cd examples/python
behave features/
# Run specific tags
behave features/ --tags=@player

Acceptance tests support two backends through a CommandClient gRPC abstraction — the same step definitions drive either mode without change:

ModeDescriptionInfrastructureUse Case
In-process (default)In-process RuntimeBuilderChannel bus + SQLiteFast local development
DirectRemote gRPC against deployed cluster (chart 0.5.1)NATS + PostgresK8s validation, CI

In-process mode uses test infrastructure, not production tooling:

  • Channel bus (in-memory) instead of NATS JetStream
  • SQLite (in-memory) instead of Postgres
  • Single process, no containers required

This trades production fidelity for speed. CI runs Direct mode against Kind with port-forwards on 1310/1311/1312 (player/table/hand).

illustrative
# In-process (default) — channel bus + SQLite
just test-unit
# Direct mode — real infrastructure via gRPC against Kind
just test-acceptance

test-unit drives step definitions against RuntimeBuilder in-process; test-acceptance drives the same steps through CommandClient gRPC calls to deployed aggregates. All six language repos are green against chart 0.5.1.


illustrative
# Unit tests
just test
# Integration tests
just integration
# Acceptance tests
just acceptance
# Contract tests (all backends)
just test-interfaces-all
illustrative
# Unit tests
cargo test --lib
# Contract tests (testcontainers for real backends)
STORAGE_BACKEND=sqlite cargo test --test interface_tests --features sqlite
STORAGE_BACKEND=postgres cargo test --test interface_tests --features postgres
# Acceptance tests
cargo test --package e2e --test acceptance

Anti-PatternProblemFix
UI steps in Gherkin”click”, “fill in”, “navigate”Use business intent
Technical assertions”database has row”, “event published”Use business outcomes
Conditional logic”if valid then X else Y”Separate scenarios
Vague outcomes”works correctly”Be specific
Hardcoded test dataMagic numbers everywhereUse meaningful descriptions
Skipping TDDTests written after codeWrite test first, watch it fail
Testing mocksMock everythingTest real implementations

A task is complete when:

  1. Implementation code exists
  2. Tests exist and exercise the implementation
  3. Tests actually run (not just specifications)
  4. Tests pass (cargo test, pytest, etc.)
  5. For Gherkin: step definitions implemented and runner passes

“Tests pass” means running the actual test command, not just writing test files.