Testing Strategy

Nothing is “done” until tests prove it works. Writing code without runnable tests is incomplete work.

TDD Workflow

Test-Driven Development is mandatory. Follow the Red-Green-Refactor cycle:

flowchart LR
    R[🔴 Red] --> G[🟢 Green] --> RF[🔵 Refactor] --> R

Phase	Action
Red	Write a failing test first. Verify it fails for the right reason. Ensure test isolation.
Green	Write minimal code to pass. No extras, no premature optimization.
Refactor	Clean up while keeping tests green. Apply SOLID principles. Remove duplication.

Critical: If a test fails, fix the issue and create a new commit. Never amend commits that have been pushed.

Four Levels of Testing

flowchart TB
    subgraph Acceptance["Acceptance Tests"]
        A1[Business behavior]
        A2[Full stack via Gherkin]
    end
    subgraph Contract["Contract Tests"]
        C1[Interface compliance]
        C2[Backend interchangeability]
    end
    subgraph Integration["Integration Tests"]
        I1[Framework internals]
        I2[In-process RuntimeBuilder]
    end
    subgraph Unit["Unit Tests"]
        U1[Pure functions]
        U2[No I/O]
    end

    Acceptance --> Contract --> Integration --> Unit

Level	Scope	Speed	Infrastructure
Unit	Single function/module	Fast (ms)	None
Integration	Framework plumbing	Fast (ms)	In-process (RuntimeBuilder, SQLite, channels)
Contract	Interface compliance	Medium (s)	Testcontainers
Acceptance	Business behavior	Medium (s)	In-process channel + SQLite or full stack (direct)

Why Cucumber/Gherkin

Angzarr uses Cucumber/Gherkin extensively—not just for acceptance tests, but also for contract tests and integration tests. The primary motivation is readability.

Ease of Reading

Gherkin specifications are easier to read than programmatic tests:

# Gherkin: Intent is immediately clear
Scenario: Events persist with correct sequence numbers
  Given an empty event store
  When I append 3 events to aggregate "player-123"
  Then the events have sequences 0, 1, 2
  And querying from sequence 1 returns 2 events

// Programmatic: Requires understanding test framework idioms
#[tokio::test]
async fn test_events_persist_with_correct_sequence_numbers() {
    let store = setup_store().await;
    let root = uuid::Uuid::new_v4();

    for i in 0..3 {
        store.append(&root, make_event(i)).await.unwrap();
    }

    let events = store.get_from(&root, 1).await.unwrap();
    assert_eq!(events.len(), 2);
    assert_eq!(events[0].sequence, 1);
}

Both test the same thing, but the Gherkin version:

Documents behavior in plain English
Serves as living specification
Is reviewable by non-developers
Makes test coverage gaps obvious

Where We Use Cucumber

Component	Test Type	Harness	Why
Core (`src/`, `tests/`)	Contract, Integration	Rust cucumber-rs	Readable specs for framework behavior
Clients (`client/{lang}/`)	Contract	Unified Rust gRPC harness	One source of truth across 6 languages
Examples (`examples/{lang}/`)	Acceptance	Per-language harnesses	Demonstrative for developers

Core Testing with Cucumber

Core framework tests use Gherkin for readability, not cross-language consistency:

tests/
├── interfaces/
│   ├── features/
│   │   ├── event_store.feature      # EventStore contract
│   │   ├── snapshot_store.feature   # SnapshotStore contract
│   │   ├── position_store.feature   # PositionStore contract
│   │   ├── event_bus.feature        # EventBus contract
│   │   └── dlq.feature              # DLQ behavior
│   └── steps/                       # Rust step definitions
├── client/
│   ├── features/
│   │   ├── aggregate-client.feature # Client SDK contracts
│   │   ├── command-builder.feature
│   │   └── query-client.feature
│   └── (Rust harness calls clients via gRPC)
└── acceptance/
    └── features/
        └── end_to_end.feature       # Full stack scenarios

Client Testing Architecture

Client libraries are tested with a unified Rust gRPC harness:

flowchart TB
    H["Rust Gherkin Harness (cucumber-rs)<br/>- Step definitions: tests/client/<br/>- Feature files: client/features/*.feature"]
    H -->|gRPC| Py[Python Client]
    H -->|gRPC| Go[Go Client]
    H -->|gRPC| Rs[Rust Client]
    H -->|gRPC| Ja[Java Client]
    H -->|gRPC| Cs[C# Client]
    H -->|gRPC| Cpp[C++ Client]

Why unified?

One source of truth for SDK contracts
Same tests validate all 6 language implementations
Tests actual gRPC protocol, not internal APIs

Client Contract Examples

# docs:start:aggregate_client_contract
Feature: AggregateClient - Command Execution
  The AggregateClient sends commands to aggregates for processing.
  Commands are validated, processed, and result in events being persisted.
  Supports async (fire-and-forget), sync, and speculative modes.

  Without command execution, the system cannot accept user actions or
  change aggregate state.
# docs:end:aggregate_client_contract

  # docs:start:client_command
  Scenario: Execute command on new aggregate
    Given a new aggregate root in domain "orders"
    When I execute a "CreateOrder" command with data "customer-123"
    Then the command should succeed
    And the response should contain 1 event
    And the event should have type "OrderCreated"

  Scenario: Execute command on existing aggregate
    Given an aggregate "orders" with root "order-001" at sequence 3
    When I execute a "AddItem" command at sequence 3
    Then the command should succeed
    And the response should contain events starting at sequence 3
  # docs:end:client_command

  # docs:start:client_concurrency
  Scenario: Command at wrong sequence fails with precondition error
    Given an aggregate "orders" with root "order-002" at sequence 5
    When I execute a command at sequence 3
    Then the command should fail with precondition error
    And the error should indicate sequence mismatch

  Scenario: Concurrent writes are detected
    Given an aggregate "orders" with root "order-003" at sequence 0
    When two commands are sent concurrently at sequence 0
    Then one should succeed
    And one should fail with precondition error
  # docs:end:client_concurrency

# docs:start:command_builder_contract
Feature: CommandBuilder - Fluent Command Construction
  The CommandBuilder provides a fluent API for constructing commands.
  It handles serialization, correlation IDs, sequence numbers, and
  type URLs while providing compile-time and runtime validation.

  The builder pattern enables both OO-style client usage and can be
  adapted for router-based implementations.
# docs:end:command_builder_contract

  Scenario: Build command with all required fields
    When I build a command for domain "orders" root "order-001"
      And I set the command type to "CreateOrder"
      And I set the command payload
    Then the built command should have domain "orders"
    And the built command should have root "order-001"
    And the built command should have type URL containing "CreateOrder"

  Scenario: Build with explicit correlation ID
    When I build a command for domain "orders"
      And I set correlation ID to "trace-123"
      And I set the command type and payload
    Then the built command should have correlation ID "trace-123"

  Scenario: Builder methods can be chained
    When I build a command using fluent chaining:
      """
      client.command("orders", root)
        .with_correlation_id("trace-456")
        .with_sequence(3)
        .with_command("CreateOrder", payload)
        .build()
      """
    Then the build should succeed
    And all chained values should be preserved

just test-client python    # Test Python client via Rust harness
just test-client go        # Test Go client via Rust harness
just test-clients          # Test all clients

Example Testing Architecture

Example implementations use per-language Gherkin harnesses:

examples/features/unit/*.feature  (shared specifications)
           │
           ├── Python: behave + examples/python/features/steps/
           ├── Go: godog + examples/go/tests/steps/
           ├── Rust: cucumber-rs + examples/rust/tests/
           ├── Java: cucumber-junit5 + examples/java/tests/
           ├── C#: SpecFlow + examples/csharp/Tests/Steps/
           └── C++: cucumber-cpp + examples/cpp/tests/

Why per-language?

Demonstrative for non-polyglot developers
Developers see Gherkin AND step definitions in their language
Educational code they can learn from and copy

just examples python test  # behave
just examples go test      # godog
just examples rust test    # cucumber-rs

Unit Tests

No external dependencies. Tests interact only with the system under test—no I/O, no concurrency, no infrastructure.

Business Logic Only

Test business logic directly. No mocks. No frameworks. No infrastructure.

Aggregates are classes: instantiate, seed state, call a @handles method, assert on the returned event. Appliers are equally testable — pass a fresh state, call the method, assert on the mutation.

Pass in state structs directly
Assert on returned events
No database connections, no message buses, no HTTP clients

If you’re writing mocks, you’re testing the wrong thing. Business logic should be deterministic methods that take data in and return data out.

Direct Method Testing

def test_deposit_increases_bankroll():
    agg = Player()
    agg.state.registered = True
    agg.state.bankroll = 1000

    event = agg.handle_deposit_funds(DepositFunds(amount=500))

    assert event.new_bankroll == 1500

def test_deposit_rejected_when_not_registered():
    agg = Player()
    agg.state.registered = False

    with pytest.raises(CommandRejectedError):
        agg.handle_deposit_funds(DepositFunds(amount=500))

Why this works:

@handles methods are ordinary methods — call them directly
No mocking required: pass state directly, assert on returned events
Proto serialization tested separately from business logic

Test Naming

Use the pattern: test_<action>_<condition>_<expected_result>

Language	Convention	Example
Python	snake_case	`test_deposit_with_zero_amount_raises_error`
Rust	snake_case	`test_deposit_with_zero_amount_raises_error`
Go	CamelCase	`TestDepositWithZeroAmountRaisesError`
Java/C#	camelCase	`depositWithZeroAmountRaisesError`

Prioritize readability over rigid format. Tests are documentation.

Event Sourcing: The Any Boundary

Events cross a serialization boundary between business logic and the framework:

flowchart TB
    subgraph BL["Business Logic"]
        C["compute(cmd, state) → raw event"]
        BS["build_state(state, events)"]
        AE["_apply_event(state, event_any)"]
        MS[mutate state]
    end
    subgraph FW["Framework"]
        AP["Any.Pack(event)"]
        EB["EventBook.pages[].event<br/>(Any-wrapped)"]
        EX[extract events from pages]
        UP["event_any.Unpack(typed_event)"]
    end
    C --> AP
    AP -->|persist to EventBook| EB
    EB --> EX
    EX --> BS
    BS --> AE
    AE --> UP
    UP --> MS

The framework stores events as opaque Any blobs—it doesn’t know business types. Business logic must decode the Any because only it knows PlayerRegistered, FundsDeposited, etc.

Full event sourcing test cycle:

def test_deposit_full_cycle():
    # 1. Start with state
    state = PlayerState(bankroll=100)
    cmd = DepositFunds(amount=50)

    # 2. compute() produces raw event
    event = compute(cmd, state)

    # 3. Wrap in Any (what framework does for persistence)
    event_any = Any()
    event_any.Pack(event, type_url_prefix="type.googleapis.com/")

    # 4. build_state applies Any-wrapped events → new state
    new_state = build_state(state, [event_any])

    assert new_state.bankroll == 150

Tests mimic the production boundary exactly—no special test-only interfaces.

Integration Tests

Test Angzarr framework internals using synthetic aggregates (EchoAggregate, MultiEventAggregate). Prove the plumbing works—not business logic.

Infrastructure: In-process only. Uses RuntimeBuilder with SQLite and channel bus. No containers.

What they cover:

Event persistence and sequence numbering
IPC event bus (named pipes, domain filtering)
gRPC over UDS transport
Channel bus pub/sub delivery
Saga activation and cross-domain command routing
Snapshot/recovery, lossy bus resilience

Gherkin Example

Feature: EventBus interface
  The EventBus distributes committed events to interested subscribers. After
  an aggregate persists events, the bus broadcasts them to sagas, projectors,
  and process managers that need to react.

  Background:
    Given an EventBus backend

  Scenario: Handlers only receive events from their subscribed domain
    Given the player-projector subscribes only to the player domain
    When events are published to player and table domains
    Then the player-projector receives only player events
    And never sees table events which are filtered out by the bus

  Scenario: Cross-domain handlers can subscribe to multiple domains
    Given the output-projector subscribed to player and table domains
    When events are published to player, table, and hand domains
    Then the output-projector receives player events because it subscribed
    And the output-projector receives table events because it subscribed
    And the output-projector does NOT receive hand events because it did not subscribe

  Scenario: Events arrive in sequence order from a single publisher
    Given a single-threaded hand aggregate publishing events
    And a projector subscribed to hand
    When events with sequences 0, 1, 2, 3, 4 are published in order
    Then the projector receives them in sequence order: 0, 1, 2, 3, 4

Contract Tests

Verify all storage implementations behave identically. Same interface, interchangeable backends.

Infrastructure: Testcontainers exclusively. This is the only level that uses containers—real Postgres, Redis, NATS, immudb.

What contracts verify:

EventStore: add, get, get_from, get_from_to, list_roots, sequence numbering
SnapshotStore: save, load, delete
PositionStore: get, set, checkpoint semantics

Location: tests/interfaces/

Gherkin Contract Specifications

Contract tests are written in Gherkin for readability. The same feature files run against every backend:

Feature: EventStore interface
  The EventStore is the source of truth for all state changes in the system.
  Every aggregate's current state is derived by replaying its events. This
  immutability provides a complete audit trail, enables temporal queries, and
  allows the system to reconstruct any aggregate's state at any point in history.

  Background:
    Given an EventStore backend

  Scenario: First event in an aggregate's history starts at sequence 0
    Given an aggregate "player" with no events
    When I add 1 event to the aggregate
    Then the aggregate should have 1 event
    And the first event should have sequence 0

  Scenario: Multiple events from a single command receive consecutive sequences
    Given an aggregate "player" with no events
    When I add 5 events to the aggregate
    Then the aggregate should have 5 events
    And events should have consecutive sequences starting from 0

  Scenario: Concurrent writers are detected via sequence mismatch
    Given an aggregate "player" with 3 events
    When I try to add an event with sequence 1
    Then the operation should fail with a sequence conflict

  Scenario: Stale writers cannot overwrite history
    Given an aggregate "player" with 3 events
    When I try to add an event with sequence 0
    Then the operation should fail with a sequence conflict

Running Contract Tests

# Run against SQLite (fast, no containers)
STORAGE_BACKEND=sqlite cargo test --test interfaces --features sqlite

# Run against PostgreSQL (testcontainers)
STORAGE_BACKEND=postgres cargo test --test interfaces --features postgres

# Run against immudb (testcontainers)
STORAGE_BACKEND=immudb cargo test --test interfaces --features immudb

Contract tests ensure you can swap backends without behavior changes. If SQLite passes but Postgres fails, the Postgres implementation has a bug.

Testcontainers

Contract tests use testcontainers to provision real databases:

async fn start_postgres() -> (ContainerAsync<GenericImage>, String) {
    let image = GenericImage::new("postgres", "16")
        .with_exposed_port(5432.tcp())
        .with_wait_for(WaitFor::message_on_stdout(
            "database system is ready",
        ));

    let container = image
        .with_env_var("POSTGRES_USER", "testuser")
        .with_env_var("POSTGRES_PASSWORD", "testpass")
        .with_env_var("POSTGRES_DB", "testdb")
        .start()
        .await
        .expect("Failed to start container");

    let host_port = container.get_host_port_ipv4(5432).await.unwrap();
    let url = format!("postgres://testuser:testpass@localhost:{}/testdb", host_port);

    (container, url)
}

Benefits:

Zero setup — tests start containers automatically
Isolation — each test gets fresh state
Realistic — tests run against real databases, not mocks

Acceptance Tests

Test business behavior through the full stack. Written in Gherkin, describing what the system does from a business perspective.

Location: examples/rust/e2e/tests/features/

Gherkin Authoring

Gherkin is business-readable specification, not test code. Describe what the system does and why it matters—never how.

The litmus test: “Will this wording change if the implementation changes?” If yes, abstract to behavior.

Declarative Over Imperative

# Wrong: UI choreography
When I click "Add to Cart"
And I click "Checkout"
And I fill in "Card Number" with "4111..."

# Right: Business intent
When I purchase the items in my cart

Given-When-Then Semantics

Keyword	Purpose	Example
Given	Establish context (past state)	`Given a player with $500 in their bankroll`
When	Single triggering action	`When the player reserves $200 for the table`
Then	Verify business outcomes	`Then the player's available balance is $300`

Business Language

Technical (Avoid)	Business (Prefer)
API returns 201	Order is confirmed
Database has record	Customer exists
Event is published	Notification is sent
State machine transitions	Hand progresses to showdown

Exception: Framework tests (event stores, buses) use technical vocabulary—it’s their domain.

One Scenario, One Behavior

Each scenario tests exactly one thing. Multiple When-Then pairs = multiple scenarios.

Feature Preambles

Open features with context explaining what this capability enables, why it matters, and what breaks if it doesn’t work:

Feature: Player fund reservation

  Players must reserve funds when joining a table. This ensures:
  - Players can cover their buy-in before sitting down
  - Funds are locked and cannot be double-spent across tables

  Without fund reservation, players could join multiple tables with
  the same bankroll, creating settlement disputes.

Error Cases Are First-Class

Don’t just test happy paths. Business rules live in constraints:

Scenario: Cannot reserve more than available balance
  Given Alice has $500 available
  When Alice tries to reserve $600
  Then the request fails with "insufficient funds"
  And Alice's available balance remains $500

Cross-Domain Scenarios

Show saga/PM translations explicitly without exposing implementation:

Scenario: Order completion triggers fulfillment
  Given an order with items:
    | sku    | quantity |
    | WIDGET | 3        |
  When the order is completed
  Then a fulfillment request is created with:
    | sku    | quantity |
    | WIDGET | 3        |

Shared Feature Files

The same Gherkin scenarios validate all language implementations. angzarr-project/features/acceptance/ is the canonical source; each example repo pulls it in via the project submodule.

angzarr-project/features/acceptance/
├── poker_game.feature      # End-to-end poker flow
└── sync_modes.feature      # Commit vs cascade semantics

Each of the six language repos (Python, Rust, Go, Java, C#, C++) implements 169 step definitions against the same features — same behavior, same assertions, different runtimes.

Running Acceptance Tests

Python

# Python uses behave
cd examples/python
behave features/

# Run specific tags
behave features/ --tags=@player

Two Execution Modes

Acceptance tests support two backends through a CommandClient gRPC abstraction — the same step definitions drive either mode without change:

Mode	Description	Infrastructure	Use Case
In-process (default)	In-process `RuntimeBuilder`	Channel bus + SQLite	Fast local development
Direct	Remote gRPC against deployed cluster (chart `0.5.1`)	NATS + Postgres	K8s validation, CI

In-process mode uses test infrastructure, not production tooling:

Channel bus (in-memory) instead of NATS JetStream
SQLite (in-memory) instead of Postgres
Single process, no containers required

This trades production fidelity for speed. CI runs Direct mode against Kind with port-forwards on 1310/1311/1312 (player/table/hand).

# In-process (default) — channel bus + SQLite
just test-unit

# Direct mode — real infrastructure via gRPC against Kind
just test-acceptance

test-unit drives step definitions against RuntimeBuilder in-process; test-acceptance drives the same steps through CommandClient gRPC calls to deployed aggregates. All six language repos are green against chart 0.5.1.

Running Tests

Using just (recommended)

# Unit tests
just test

# Integration tests
just integration

# Acceptance tests
just acceptance

# Contract tests (all backends)
just test-interfaces-all

Direct commands

# Unit tests
cargo test --lib

# Contract tests (testcontainers for real backends)
STORAGE_BACKEND=sqlite cargo test --test interface_tests --features sqlite
STORAGE_BACKEND=postgres cargo test --test interface_tests --features postgres

# Acceptance tests
cargo test --package e2e --test acceptance

Anti-Patterns

Anti-Pattern	Problem	Fix
UI steps in Gherkin	”click”, “fill in”, “navigate”	Use business intent
Technical assertions	”database has row”, “event published”	Use business outcomes
Conditional logic	”if valid then X else Y”	Separate scenarios
Vague outcomes	”works correctly”	Be specific
Hardcoded test data	Magic numbers everywhere	Use meaningful descriptions
Skipping TDD	Tests written after code	Write test first, watch it fail
Testing mocks	Mock everything	Test real implementations

Definition of Done

A task is complete when:

Implementation code exists
Tests exist and exercise the implementation
Tests actually run (not just specifications)
Tests pass (cargo test, pytest, etc.)
For Gherkin: step definitions implemented and runner passes

“Tests pass” means running the actual test command, not just writing test files.