Skip to main content

Testing Strategy

Nothing is "done" until tests prove it works. Writing code without runnable tests is incomplete work.


TDD Workflow

Test-Driven Development is mandatory. Follow the Red-Green-Refactor cycle:

PhaseAction
RedWrite a failing test first. Verify it fails for the right reason. Ensure test isolation.
GreenWrite minimal code to pass. No extras, no premature optimization.
RefactorClean up while keeping tests green. Apply SOLID principles. Remove duplication.

Critical: If a test fails, fix the issue and create a new commit. Never amend commits that have been pushed.


Four Levels of Testing

LevelScopeSpeedInfrastructure
UnitSingle function/moduleFast (ms)None
IntegrationFramework plumbingFast (ms)In-process (RuntimeBuilder, SQLite, channels)
ContractInterface complianceMedium (s)Testcontainers
AcceptanceBusiness behaviorMedium (s)Channel + SQLite (standalone) or full stack (direct)

Why Cucumber/Gherkin

Angzarr uses Cucumber/Gherkin extensively—not just for acceptance tests, but also for contract tests and integration tests. The primary motivation is readability.

Ease of Reading

Gherkin specifications are easier to read than programmatic tests:

illustrative
# Gherkin: Intent is immediately clear
Scenario: Events persist with correct sequence numbers
Given an empty event store
When I append 3 events to aggregate "player-123"
Then the events have sequences 0, 1, 2
And querying from sequence 1 returns 2 events
illustrative
// Programmatic: Requires understanding test framework idioms
#[tokio::test]
async fn test_events_persist_with_correct_sequence_numbers() {
let store = setup_store().await;
let root = uuid::Uuid::new_v4();

for i in 0..3 {
store.append(&root, make_event(i)).await.unwrap();
}

let events = store.get_from(&root, 1).await.unwrap();
assert_eq!(events.len(), 2);
assert_eq!(events[0].sequence, 1);
}

Both test the same thing, but the Gherkin version:

  • Documents behavior in plain English
  • Serves as living specification
  • Is reviewable by non-developers
  • Makes test coverage gaps obvious

Where We Use Cucumber

ComponentTest TypeHarnessWhy
Core (src/, tests/)Contract, IntegrationRust cucumber-rsReadable specs for framework behavior
Clients (client/{lang}/)ContractUnified Rust gRPC harnessOne source of truth across 6 languages
Examples (examples/{lang}/)AcceptancePer-language harnessesDemonstrative for developers

Core Testing with Cucumber

Core framework tests use Gherkin for readability, not cross-language consistency:

illustrative - directory structure
tests/
├── interfaces/
│ ├── features/
│ │ ├── event_store.feature # EventStore contract
│ │ ├── snapshot_store.feature # SnapshotStore contract
│ │ ├── position_store.feature # PositionStore contract
│ │ ├── event_bus.feature # EventBus contract
│ │ └── dlq.feature # DLQ behavior
│ └── steps/ # Rust step definitions
├── client/
│ ├── features/
│ │ ├── aggregate-client.feature # Client SDK contracts
│ │ ├── command-builder.feature
│ │ └── query-client.feature
│ └── (Rust harness calls clients via gRPC)
└── acceptance/
└── features/
└── end_to_end.feature # Full stack scenarios

Client Testing Architecture

Client libraries are tested with a unified Rust gRPC harness:

illustrative - architecture diagram
┌─────────────────────────────────────────────────────────────┐
│ Rust Gherkin Harness (cucumber-rs) │
│ - Step definitions: tests/client/ │
│ - Feature files: client/features/*.feature │
└─────────────────┬───────────────────────────────────────────┘
│ gRPC
┌─────────────┼─────────────┬─────────────┬───────────────┐
▼ ▼ ▼ ▼ ▼
┌────────┐ ┌────────┐ ┌────────┐ ┌────────┐ ┌────────┐ ┌────────┐
│ Python │ │ Go │ │ Rust │ │ Java │ │ C# │ │ C++ │
│ Client │ │ Client │ │ Client │ │ Client │ │ Client │ │ Client │
└────────┘ └────────┘ └────────┘ └────────┘ └────────┘ └────────┘

Why unified?

  • One source of truth for SDK contracts
  • Same tests validate all 6 language implementations
  • Tests actual gRPC protocol, not internal APIs

Client Contract Examples

client/features/aggregate_client.feature
# docs:start:aggregate_client_contract
Feature: AggregateClient - Command Execution
The AggregateClient sends commands to aggregates for processing.
Commands are validated, processed, and result in events being persisted.
Supports async (fire-and-forget), sync, and speculative modes.

Without command execution, the system cannot accept user actions or
change aggregate state.
# docs:end:aggregate_client_contract

# docs:start:client_command
Scenario: Execute command on new aggregate
Given a new aggregate root in domain "orders"
When I execute a "CreateOrder" command with data "customer-123"
Then the command should succeed
And the response should contain 1 event
And the event should have type "OrderCreated"

Scenario: Execute command on existing aggregate
Given an aggregate "orders" with root "order-001" at sequence 3
When I execute a "AddItem" command at sequence 3
Then the command should succeed
And the response should contain events starting at sequence 3
# docs:end:client_command

# docs:start:client_concurrency
Scenario: Command at wrong sequence fails with precondition error
Given an aggregate "orders" with root "order-002" at sequence 5
When I execute a command at sequence 3
Then the command should fail with precondition error
And the error should indicate sequence mismatch

Scenario: Concurrent writes are detected
Given an aggregate "orders" with root "order-003" at sequence 0
When two commands are sent concurrently at sequence 0
Then one should succeed
And one should fail with precondition error
# docs:end:client_concurrency
client/features/command_builder.feature
# docs:start:command_builder_contract
Feature: CommandBuilder - Fluent Command Construction
The CommandBuilder provides a fluent API for constructing commands.
It handles serialization, correlation IDs, sequence numbers, and
type URLs while providing compile-time and runtime validation.

The builder pattern enables both OO-style client usage and can be
adapted for router-based implementations.
# docs:end:command_builder_contract

Scenario: Build command with all required fields
When I build a command for domain "orders" root "order-001"
And I set the command type to "CreateOrder"
And I set the command payload
Then the built command should have domain "orders"
And the built command should have root "order-001"
And the built command should have type URL containing "CreateOrder"

Scenario: Build with explicit correlation ID
When I build a command for domain "orders"
And I set correlation ID to "trace-123"
And I set the command type and payload
Then the built command should have correlation ID "trace-123"

Scenario: Builder methods can be chained
When I build a command using fluent chaining:
"""
client.command("orders", root)
.with_correlation_id("trace-456")
.with_sequence(3)
.with_command("CreateOrder", payload)
.build()
"""
Then the build should succeed
And all chained values should be preserved
illustrative
just test-client python    # Test Python client via Rust harness
just test-client go # Test Go client via Rust harness
just test-clients # Test all clients

Example Testing Architecture

Example implementations use per-language Gherkin harnesses:

illustrative - directory structure
examples/features/unit/*.feature  (shared specifications)

├── Python: behave + examples/python/features/steps/
├── Go: godog + examples/go/tests/steps/
├── Rust: cucumber-rs + examples/rust/tests/
├── Java: cucumber-junit5 + examples/java/tests/
├── C#: SpecFlow + examples/csharp/Tests/Steps/
└── C++: cucumber-cpp + examples/cpp/tests/

Why per-language?

  • Demonstrative for non-polyglot developers
  • Developers see Gherkin AND step definitions in their language
  • Educational code they can learn from and copy
illustrative
just examples python test  # behave
just examples go test # godog
just examples rust test # cucumber-rs

Unit Tests

No external dependencies. Tests interact only with the system under test—no I/O, no concurrency, no infrastructure.

Business Logic Only

Test business logic directly. No mocks. No frameworks. No infrastructure.

The guard/validate/compute pattern isolates business logic from everything else:

  • Pass in state structs directly
  • Assert on returned events
  • No database connections, no message buses, no HTTP clients

If you're writing mocks, you're testing the wrong thing. Business logic should be pure functions that take data in and return data out.

The guard/validate/compute Pattern

All aggregate command handlers follow a three-function pattern that makes business logic 100% unit testable:

illustrative - pseudocode
guard(state) → Result<()>
Check state preconditions (aggregate exists, correct phase, etc.)
Pure function: state in, Result out

validate(cmd, state) → Result<ValidatedData>
Validate command inputs against current state
Returns validated/transformed data needed by compute
Pure function: command + state in, Result out

compute(cmd, state, validated) → Event
Build the resulting event from inputs
Pure function: no side effects, deterministic output
All business calculations happen here

Why this matters:

  • guard(), validate(), compute() are pure functions—call directly in tests
  • No mocking required: pass state structs directly, assert on returned events
  • Each function has single responsibility, testable in isolation
  • Proto serialization tested separately from business logic
def test_deposit_increases_bankroll():
"""Test that deposit correctly calculates new balance."""
state = PlayerState()
state.player_id = "player_test@example.com" # Makes exists return True
state.bankroll = 1000

cmd = player.DepositFunds(
amount=poker_types.Currency(amount=500, currency_code="CHIPS")
)

event = deposit_compute(cmd, state, 500)

assert event.new_balance.amount == 1500


def test_deposit_rejects_non_existent_player():
"""Test that deposit guard rejects non-existent player."""
state = PlayerState()
# player_id is empty by default, so exists returns False

with pytest.raises(CommandRejectedError) as exc_info:
deposit_guard(state)

assert "does not exist" in str(exc_info.value)


def test_deposit_rejects_zero_amount():
"""Test that deposit validate rejects zero amount."""
cmd = player.DepositFunds(
amount=poker_types.Currency(amount=0, currency_code="CHIPS")
)

with pytest.raises(CommandRejectedError) as exc_info:
deposit_validate(cmd)

assert "positive" in str(exc_info.value)


Test Naming

Use the pattern: test_<action>_<condition>_<expected_result>

LanguageConventionExample
Pythonsnake_casetest_deposit_with_zero_amount_raises_error
Rustsnake_casetest_deposit_with_zero_amount_raises_error
GoCamelCaseTestDepositWithZeroAmountRaisesError
Java/C#camelCasedepositWithZeroAmountRaisesError

Prioritize readability over rigid format. Tests are documentation.

Event Sourcing: The Any Boundary

Events cross a serialization boundary between business logic and the framework:

illustrative - data flow diagram
Business Logic                    Framework
─────────────────────────────────────────────────────
compute(cmd, state) → raw event

Any.Pack(event) → persist to EventBook

EventBook.pages[].event (Any-wrapped)

build_state(state, events) ← extract events from pages

_apply_event(state, event_any)

event_any.Unpack(typed_event)

mutate state

The framework stores events as opaque Any blobs—it doesn't know business types. Business logic must decode the Any because only it knows PlayerRegistered, FundsDeposited, etc.

Full event sourcing test cycle:

illustrative
def test_deposit_full_cycle():
# 1. Start with state
state = PlayerState(bankroll=100)
cmd = DepositFunds(amount=50)

# 2. compute() produces raw event
event = compute(cmd, state)

# 3. Wrap in Any (what framework does for persistence)
event_any = Any()
event_any.Pack(event, type_url_prefix="type.googleapis.com/")

# 4. build_state applies Any-wrapped events → new state
new_state = build_state(state, [event_any])

assert new_state.bankroll == 150

Tests mimic the production boundary exactly—no special test-only interfaces.


Integration Tests

Test Angzarr framework internals using synthetic aggregates (EchoAggregate, MultiEventAggregate). Prove the plumbing works—not business logic.

Infrastructure: In-process only. Uses RuntimeBuilder with SQLite and channel bus. No containers.

What they cover:

  • Event persistence and sequence numbering
  • IPC event bus (named pipes, domain filtering)
  • gRPC over UDS transport
  • Channel bus pub/sub delivery
  • Saga activation and cross-domain command routing
  • Snapshot/recovery, lossy bus resilience

Location: tests/standalone_integration/

Gherkin Example

tests/interfaces/features/event_bus.feature
Feature: EventBus interface
The EventBus distributes committed events to interested subscribers. After
an aggregate persists events, the bus broadcasts them to sagas, projectors,
and process managers that need to react.

Background:
Given an EventBus backend

Scenario: Handlers only receive events from their subscribed domain
Given the player-projector subscribes only to the player domain
When events are published to player and table domains
Then the player-projector receives only player events
And never sees table events which are filtered out by the bus

Scenario: Cross-domain handlers can subscribe to multiple domains
Given the output-projector subscribed to player and table domains
When events are published to player, table, and hand domains
Then the output-projector receives player events because it subscribed
And the output-projector receives table events because it subscribed
And the output-projector does NOT receive hand events because it did not subscribe

Scenario: Events arrive in sequence order from a single publisher
Given a single-threaded hand aggregate publishing events
And a projector subscribed to hand
When events with sequences 0, 1, 2, 3, 4 are published in order
Then the projector receives them in sequence order: 0, 1, 2, 3, 4

Contract Tests

Verify all storage implementations behave identically. Same interface, interchangeable backends.

Infrastructure: Testcontainers exclusively. This is the only level that uses containers—real Postgres, Redis, NATS, immudb.

What contracts verify:

  • EventStore: add, get, get_from, get_from_to, list_roots, sequence numbering
  • SnapshotStore: save, load, delete
  • PositionStore: get, set, checkpoint semantics

Location: tests/interfaces/

Gherkin Contract Specifications

Contract tests are written in Gherkin for readability. The same feature files run against every backend:

tests/interfaces/features/event_store.feature
Feature: EventStore interface
The EventStore is the source of truth for all state changes in the system.
Every aggregate's current state is derived by replaying its events. This
immutability provides a complete audit trail, enables temporal queries, and
allows the system to reconstruct any aggregate's state at any point in history.

Background:
Given an EventStore backend

Scenario: First event in an aggregate's history starts at sequence 0
Given an aggregate "player" with no events
When I add 1 event to the aggregate
Then the aggregate should have 1 event
And the first event should have sequence 0

Scenario: Multiple events from a single command receive consecutive sequences
Given an aggregate "player" with no events
When I add 5 events to the aggregate
Then the aggregate should have 5 events
And events should have consecutive sequences starting from 0

Scenario: Concurrent writers are detected via sequence mismatch
Given an aggregate "player" with 3 events
When I try to add an event with sequence 1
Then the operation should fail with a sequence conflict

Scenario: Stale writers cannot overwrite history
Given an aggregate "player" with 3 events
When I try to add an event with sequence 0
Then the operation should fail with a sequence conflict

Running Contract Tests

illustrative
# Run against SQLite (fast, no containers)
STORAGE_BACKEND=sqlite cargo test --test interfaces --features sqlite

# Run against PostgreSQL (testcontainers)
STORAGE_BACKEND=postgres cargo test --test interfaces --features postgres

# Run against immudb (testcontainers)
STORAGE_BACKEND=immudb cargo test --test interfaces --features immudb

Contract tests ensure you can swap backends without behavior changes. If SQLite passes but Postgres fails, the Postgres implementation has a bug.

Testcontainers

Contract tests use testcontainers to provision real databases:

illustrative
async fn start_postgres() -> (ContainerAsync<GenericImage>, String) {
let image = GenericImage::new("postgres", "16")
.with_exposed_port(5432.tcp())
.with_wait_for(WaitFor::message_on_stdout(
"database system is ready",
));

let container = image
.with_env_var("POSTGRES_USER", "testuser")
.with_env_var("POSTGRES_PASSWORD", "testpass")
.with_env_var("POSTGRES_DB", "testdb")
.start()
.await
.expect("Failed to start container");

let host_port = container.get_host_port_ipv4(5432).await.unwrap();
let url = format!("postgres://testuser:testpass@localhost:{}/testdb", host_port);

(container, url)
}

Benefits:

  • Zero setup — tests start containers automatically
  • Isolation — each test gets fresh state
  • Realistic — tests run against real databases, not mocks

Acceptance Tests

Test business behavior through the full stack. Written in Gherkin, describing what the system does from a business perspective.

Location: examples/rust/e2e/tests/features/

Gherkin Authoring

Gherkin is business-readable specification, not test code. Describe what the system does and why it matters—never how.

The litmus test: "Will this wording change if the implementation changes?" If yes, abstract to behavior.

Declarative Over Imperative

illustrative
# Wrong: UI choreography
When I click "Add to Cart"
And I click "Checkout"
And I fill in "Card Number" with "4111..."

# Right: Business intent
When I purchase the items in my cart

Given-When-Then Semantics

KeywordPurposeExample
GivenEstablish context (past state)Given a player with $500 in their bankroll
WhenSingle triggering actionWhen the player reserves $200 for the table
ThenVerify business outcomesThen the player's available balance is $300

Business Language

Technical (Avoid)Business (Prefer)
API returns 201Order is confirmed
Database has recordCustomer exists
Event is publishedNotification is sent
State machine transitionsHand progresses to showdown

Exception: Framework tests (event stores, buses) use technical vocabulary—it's their domain.

One Scenario, One Behavior

Each scenario tests exactly one thing. Multiple When-Then pairs = multiple scenarios.

Feature Preambles

Open features with context explaining what this capability enables, why it matters, and what breaks if it doesn't work:

illustrative
Feature: Player fund reservation

Players must reserve funds when joining a table. This ensures:
- Players can cover their buy-in before sitting down
- Funds are locked and cannot be double-spent across tables

Without fund reservation, players could join multiple tables with
the same bankroll, creating settlement disputes.

Error Cases Are First-Class

Don't just test happy paths. Business rules live in constraints:

illustrative
Scenario: Cannot reserve more than available balance
Given Alice has $500 available
When Alice tries to reserve $600
Then the request fails with "insufficient funds"
And Alice's available balance remains $500

Cross-Domain Scenarios

Show saga/PM translations explicitly without exposing implementation:

illustrative
Scenario: Order completion triggers fulfillment
Given an order with items:
| sku | quantity |
| WIDGET | 3 |
When the order is completed
Then a fulfillment request is created with:
| sku | quantity |
| WIDGET | 3 |

Shared Feature Files

The same Gherkin scenarios validate all language implementations:

illustrative - directory structure
examples/
├── features/ # Shared Gherkin (canonical source)
│ ├── player.feature
│ ├── table.feature
│ └── hand.feature
├── python/features/ # Symlinks to ../features/
├── go/features/ # Symlinks to ../features/
└── rust/e2e/tests/features/ # Symlinks

Running Acceptance Tests

illustrative
# Rust uses cucumber-rs
cargo test --package e2e --test acceptance

# Run specific tags
cargo test --package e2e --test acceptance -- --tags @player

Two Execution Modes

Acceptance tests support two backends via trait abstraction:

ModeDescriptionInfrastructureUse Case
Standalone (default)In-process RuntimeBuilderChannel bus + SQLiteFast local development
DirectRemote gRPC against deployed clusterNATS + Postgres (or configured)K8s validation

Standalone mode uses test infrastructure, not production tooling:

  • Channel bus (in-memory) instead of NATS JetStream
  • SQLite (in-memory) instead of Postgres
  • Single process, no containers required

This trades production fidelity for speed. Use Direct mode to validate against real infrastructure.

illustrative
# Standalone (default) — channel bus + SQLite, in-process
cargo test --package e2e --test acceptance

# Direct mode — real infrastructure via gRPC
ANGZARR_TEST_MODE=direct \
ANGZARR_ORDER_ENDPOINT=http://localhost:1310 \
cargo test --package e2e --test acceptance

Running Tests

illustrative
# Unit tests
just test

# Integration tests
just integration

# Acceptance tests
just acceptance

# Contract tests (all backends)
just test-interfaces-all

Direct commands

illustrative
# Unit tests
cargo test --lib

# Integration tests (in-process, no containers)
cargo test --test standalone_integration --features sqlite

# Contract tests (testcontainers for real backends)
STORAGE_BACKEND=sqlite cargo test --test interface_tests --features sqlite
STORAGE_BACKEND=postgres cargo test --test interface_tests --features postgres

# Acceptance tests
cargo test --package e2e --test acceptance

Anti-Patterns

Anti-PatternProblemFix
UI steps in Gherkin"click", "fill in", "navigate"Use business intent
Technical assertions"database has row", "event published"Use business outcomes
Conditional logic"if valid then X else Y"Separate scenarios
Vague outcomes"works correctly"Be specific
Hardcoded test dataMagic numbers everywhereUse meaningful descriptions
Skipping TDDTests written after codeWrite test first, watch it fail
Testing mocksMock everythingTest real implementations

Definition of Done

A task is complete when:

  1. Implementation code exists
  2. Tests exist and exercise the implementation
  3. Tests actually run (not just specifications)
  4. Tests pass (cargo test, pytest, etc.)
  5. For Gherkin: step definitions implemented and runner passes

"Tests pass" means running the actual test command, not just writing test files.