Testing Strategy
Nothing is "done" until tests prove it works. Writing code without runnable tests is incomplete work.
TDD Workflow
Test-Driven Development is mandatory. Follow the Red-Green-Refactor cycle:
| Phase | Action |
|---|---|
| Red | Write a failing test first. Verify it fails for the right reason. Ensure test isolation. |
| Green | Write minimal code to pass. No extras, no premature optimization. |
| Refactor | Clean up while keeping tests green. Apply SOLID principles. Remove duplication. |
Critical: If a test fails, fix the issue and create a new commit. Never amend commits that have been pushed.
Four Levels of Testing
| Level | Scope | Speed | Infrastructure |
|---|---|---|---|
| Unit | Single function/module | Fast (ms) | None |
| Integration | Framework plumbing | Fast (ms) | In-process (RuntimeBuilder, SQLite, channels) |
| Contract | Interface compliance | Medium (s) | Testcontainers |
| Acceptance | Business behavior | Medium (s) | Channel + SQLite (standalone) or full stack (direct) |
Why Cucumber/Gherkin
Angzarr uses Cucumber/Gherkin extensively—not just for acceptance tests, but also for contract tests and integration tests. The primary motivation is readability.
Ease of Reading
Gherkin specifications are easier to read than programmatic tests:
# Gherkin: Intent is immediately clear
Scenario: Events persist with correct sequence numbers
Given an empty event store
When I append 3 events to aggregate "player-123"
Then the events have sequences 0, 1, 2
And querying from sequence 1 returns 2 events
// Programmatic: Requires understanding test framework idioms
#[tokio::test]
async fn test_events_persist_with_correct_sequence_numbers() {
let store = setup_store().await;
let root = uuid::Uuid::new_v4();
for i in 0..3 {
store.append(&root, make_event(i)).await.unwrap();
}
let events = store.get_from(&root, 1).await.unwrap();
assert_eq!(events.len(), 2);
assert_eq!(events[0].sequence, 1);
}
Both test the same thing, but the Gherkin version:
- Documents behavior in plain English
- Serves as living specification
- Is reviewable by non-developers
- Makes test coverage gaps obvious
Where We Use Cucumber
| Component | Test Type | Harness | Why |
|---|---|---|---|
Core (src/, tests/) | Contract, Integration | Rust cucumber-rs | Readable specs for framework behavior |
Clients (client/{lang}/) | Contract | Unified Rust gRPC harness | One source of truth across 6 languages |
Examples (examples/{lang}/) | Acceptance | Per-language harnesses | Demonstrative for developers |
Core Testing with Cucumber
Core framework tests use Gherkin for readability, not cross-language consistency:
tests/
├── interfaces/
│ ├── features/
│ │ ├── event_store.feature # EventStore contract
│ │ ├── snapshot_store.feature # SnapshotStore contract
│ │ ├── position_store.feature # PositionStore contract
│ │ ├── event_bus.feature # EventBus contract
│ │ └── dlq.feature # DLQ behavior
│ └── steps/ # Rust step definitions
├── client/
│ ├── features/
│ │ ├── aggregate-client.feature # Client SDK contracts
│ │ ├── command-builder.feature
│ │ └── query-client.feature
│ └── (Rust harness calls clients via gRPC)
└── acceptance/
└── features/
└── end_to_end.feature # Full stack scenarios
Client Testing Architecture
Client libraries are tested with a unified Rust gRPC harness:
┌─────────────────────────────────────────────────────────────┐
│ Rust Gherkin Harness (cucumber-rs) │
│ - Step definitions: tests/client/ │
│ - Feature files: client/features/*.feature │
└─────────────────┬───────────────────────────────────────────┘
│ gRPC
┌─────────────┼─────────────┬─────────────┬───────────────┐
▼ ▼ ▼ ▼ ▼
┌────────┐ ┌────────┐ ┌────────┐ ┌────────┐ ┌────────┐ ┌────────┐
│ Python │ │ Go │ │ Rust │ │ Java │ │ C# │ │ C++ │
│ Client │ │ Client │ │ Client │ │ Client │ │ Client │ │ Client │
└────────┘ └────────┘ └────────┘ └────────┘ └────────┘ └────────┘
Why unified?
- One source of truth for SDK contracts
- Same tests validate all 6 language implementations
- Tests actual gRPC protocol, not internal APIs
Client Contract Examples
# docs:start:aggregate_client_contract
Feature: AggregateClient - Command Execution
The AggregateClient sends commands to aggregates for processing.
Commands are validated, processed, and result in events being persisted.
Supports async (fire-and-forget), sync, and speculative modes.
Without command execution, the system cannot accept user actions or
change aggregate state.
# docs:end:aggregate_client_contract
# docs:start:client_command
Scenario: Execute command on new aggregate
Given a new aggregate root in domain "orders"
When I execute a "CreateOrder" command with data "customer-123"
Then the command should succeed
And the response should contain 1 event
And the event should have type "OrderCreated"
Scenario: Execute command on existing aggregate
Given an aggregate "orders" with root "order-001" at sequence 3
When I execute a "AddItem" command at sequence 3
Then the command should succeed
And the response should contain events starting at sequence 3
# docs:end:client_command
# docs:start:client_concurrency
Scenario: Command at wrong sequence fails with precondition error
Given an aggregate "orders" with root "order-002" at sequence 5
When I execute a command at sequence 3
Then the command should fail with precondition error
And the error should indicate sequence mismatch
Scenario: Concurrent writes are detected
Given an aggregate "orders" with root "order-003" at sequence 0
When two commands are sent concurrently at sequence 0
Then one should succeed
And one should fail with precondition error
# docs:end:client_concurrency
# docs:start:command_builder_contract
Feature: CommandBuilder - Fluent Command Construction
The CommandBuilder provides a fluent API for constructing commands.
It handles serialization, correlation IDs, sequence numbers, and
type URLs while providing compile-time and runtime validation.
The builder pattern enables both OO-style client usage and can be
adapted for router-based implementations.
# docs:end:command_builder_contract
Scenario: Build command with all required fields
When I build a command for domain "orders" root "order-001"
And I set the command type to "CreateOrder"
And I set the command payload
Then the built command should have domain "orders"
And the built command should have root "order-001"
And the built command should have type URL containing "CreateOrder"
Scenario: Build with explicit correlation ID
When I build a command for domain "orders"
And I set correlation ID to "trace-123"
And I set the command type and payload
Then the built command should have correlation ID "trace-123"
Scenario: Builder methods can be chained
When I build a command using fluent chaining:
"""
client.command("orders", root)
.with_correlation_id("trace-456")
.with_sequence(3)
.with_command("CreateOrder", payload)
.build()
"""
Then the build should succeed
And all chained values should be preserved
just test-client python # Test Python client via Rust harness
just test-client go # Test Go client via Rust harness
just test-clients # Test all clients
Example Testing Architecture
Example implementations use per-language Gherkin harnesses:
examples/features/unit/*.feature (shared specifications)
│
├── Python: behave + examples/python/features/steps/
├── Go: godog + examples/go/tests/steps/
├── Rust: cucumber-rs + examples/rust/tests/
├── Java: cucumber-junit5 + examples/java/tests/
├── C#: SpecFlow + examples/csharp/Tests/Steps/
└── C++: cucumber-cpp + examples/cpp/tests/
Why per-language?
- Demonstrative for non-polyglot developers
- Developers see Gherkin AND step definitions in their language
- Educational code they can learn from and copy
just examples python test # behave
just examples go test # godog
just examples rust test # cucumber-rs
Unit Tests
No external dependencies. Tests interact only with the system under test—no I/O, no concurrency, no infrastructure.
Business Logic Only
Test business logic directly. No mocks. No frameworks. No infrastructure.
The guard/validate/compute pattern isolates business logic from everything else:
- Pass in state structs directly
- Assert on returned events
- No database connections, no message buses, no HTTP clients
If you're writing mocks, you're testing the wrong thing. Business logic should be pure functions that take data in and return data out.
The guard/validate/compute Pattern
All aggregate command handlers follow a three-function pattern that makes business logic 100% unit testable:
guard(state) → Result<()>
Check state preconditions (aggregate exists, correct phase, etc.)
Pure function: state in, Result out
validate(cmd, state) → Result<ValidatedData>
Validate command inputs against current state
Returns validated/transformed data needed by compute
Pure function: command + state in, Result out
compute(cmd, state, validated) → Event
Build the resulting event from inputs
Pure function: no side effects, deterministic output
All business calculations happen here
Why this matters:
guard(),validate(),compute()are pure functions—call directly in tests- No mocking required: pass state structs directly, assert on returned events
- Each function has single responsibility, testable in isolation
- Proto serialization tested separately from business logic
- Python
- Go
- Rust
def test_deposit_increases_bankroll():
"""Test that deposit correctly calculates new balance."""
state = PlayerState()
state.player_id = "player_test@example.com" # Makes exists return True
state.bankroll = 1000
cmd = player.DepositFunds(
amount=poker_types.Currency(amount=500, currency_code="CHIPS")
)
event = deposit_compute(cmd, state, 500)
assert event.new_balance.amount == 1500
def test_deposit_rejects_non_existent_player():
"""Test that deposit guard rejects non-existent player."""
state = PlayerState()
# player_id is empty by default, so exists returns False
with pytest.raises(CommandRejectedError) as exc_info:
deposit_guard(state)
assert "does not exist" in str(exc_info.value)
def test_deposit_rejects_zero_amount():
"""Test that deposit validate rejects zero amount."""
cmd = player.DepositFunds(
amount=poker_types.Currency(amount=0, currency_code="CHIPS")
)
with pytest.raises(CommandRejectedError) as exc_info:
deposit_validate(cmd)
assert "positive" in str(exc_info.value)
func TestDepositIncreasesBankroll(t *testing.T) {
state := PlayerState{
PlayerID: "player_1",
Bankroll: 1000,
}
cmd := &examples.DepositFunds{
Amount: &examples.Currency{Amount: 500, CurrencyCode: "CHIPS"},
}
event := computeFundsDeposited(cmd, state, 500)
assert.Equal(t, int64(1500), event.NewBalance.Amount)
}
func TestDepositRejectsNonExistentPlayer(t *testing.T) {
state := PlayerState{} // PlayerID empty = doesn't exist
err := guardDepositFunds(state)
assert.Error(t, err)
assert.Contains(t, err.Error(), "does not exist")
}
func TestDepositRejectsZeroAmount(t *testing.T) {
cmd := &examples.DepositFunds{
Amount: &examples.Currency{Amount: 0, CurrencyCode: "CHIPS"},
}
_, err := validateDepositFunds(cmd)
assert.Error(t, err)
assert.Contains(t, err.Error(), "positive")
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_deposit_increases_bankroll() {
let state = PlayerState {
player_id: "player_1".to_string(),
bankroll: 1000,
..Default::default()
};
let cmd = DepositFunds {
amount: Some(Currency {
amount: 500,
currency_code: "CHIPS".to_string(),
}),
};
let event = compute(&cmd, &state, 500);
assert_eq!(event.new_balance.unwrap().amount, 1500);
}
#[test]
fn test_deposit_rejects_non_existent_player() {
let state = PlayerState::default(); // player_id empty = doesn't exist
let result = guard(&state);
assert!(result.is_err());
assert!(result.unwrap_err().reason.contains("does not exist"));
}
#[test]
fn test_deposit_rejects_zero_amount() {
let cmd = DepositFunds {
amount: Some(Currency {
amount: 0,
currency_code: "CHIPS".to_string(),
}),
};
let result = validate(&cmd);
assert!(result.is_err());
assert!(result.unwrap_err().reason.contains("positive"));
}
}
Test Naming
Use the pattern: test_<action>_<condition>_<expected_result>
| Language | Convention | Example |
|---|---|---|
| Python | snake_case | test_deposit_with_zero_amount_raises_error |
| Rust | snake_case | test_deposit_with_zero_amount_raises_error |
| Go | CamelCase | TestDepositWithZeroAmountRaisesError |
| Java/C# | camelCase | depositWithZeroAmountRaisesError |
Prioritize readability over rigid format. Tests are documentation.
Event Sourcing: The Any Boundary
Events cross a serialization boundary between business logic and the framework:
Business Logic Framework
─────────────────────────────────────────────────────
compute(cmd, state) → raw event
↓
Any.Pack(event) → persist to EventBook
↓
EventBook.pages[].event (Any-wrapped)
↓
build_state(state, events) ← extract events from pages
↓
_apply_event(state, event_any)
↓
event_any.Unpack(typed_event)
↓
mutate state
The framework stores events as opaque Any blobs—it doesn't know business types. Business logic must decode the Any because only it knows PlayerRegistered, FundsDeposited, etc.
Full event sourcing test cycle:
def test_deposit_full_cycle():
# 1. Start with state
state = PlayerState(bankroll=100)
cmd = DepositFunds(amount=50)
# 2. compute() produces raw event
event = compute(cmd, state)
# 3. Wrap in Any (what framework does for persistence)
event_any = Any()
event_any.Pack(event, type_url_prefix="type.googleapis.com/")
# 4. build_state applies Any-wrapped events → new state
new_state = build_state(state, [event_any])
assert new_state.bankroll == 150
Tests mimic the production boundary exactly—no special test-only interfaces.
Integration Tests
Test Angzarr framework internals using synthetic aggregates (EchoAggregate, MultiEventAggregate). Prove the plumbing works—not business logic.
Infrastructure: In-process only. Uses RuntimeBuilder with SQLite and channel bus. No containers.
What they cover:
- Event persistence and sequence numbering
- IPC event bus (named pipes, domain filtering)
- gRPC over UDS transport
- Channel bus pub/sub delivery
- Saga activation and cross-domain command routing
- Snapshot/recovery, lossy bus resilience
Location: tests/standalone_integration/
Gherkin Example
Feature: EventBus interface
The EventBus distributes committed events to interested subscribers. After
an aggregate persists events, the bus broadcasts them to sagas, projectors,
and process managers that need to react.
Background:
Given an EventBus backend
Scenario: Handlers only receive events from their subscribed domain
Given the player-projector subscribes only to the player domain
When events are published to player and table domains
Then the player-projector receives only player events
And never sees table events which are filtered out by the bus
Scenario: Cross-domain handlers can subscribe to multiple domains
Given the output-projector subscribed to player and table domains
When events are published to player, table, and hand domains
Then the output-projector receives player events because it subscribed
And the output-projector receives table events because it subscribed
And the output-projector does NOT receive hand events because it did not subscribe
Scenario: Events arrive in sequence order from a single publisher
Given a single-threaded hand aggregate publishing events
And a projector subscribed to hand
When events with sequences 0, 1, 2, 3, 4 are published in order
Then the projector receives them in sequence order: 0, 1, 2, 3, 4
Contract Tests
Verify all storage implementations behave identically. Same interface, interchangeable backends.
Infrastructure: Testcontainers exclusively. This is the only level that uses containers—real Postgres, Redis, NATS, immudb.
What contracts verify:
EventStore: add, get, get_from, get_from_to, list_roots, sequence numberingSnapshotStore: save, load, deletePositionStore: get, set, checkpoint semantics
Location: tests/interfaces/
Gherkin Contract Specifications
Contract tests are written in Gherkin for readability. The same feature files run against every backend:
Feature: EventStore interface
The EventStore is the source of truth for all state changes in the system.
Every aggregate's current state is derived by replaying its events. This
immutability provides a complete audit trail, enables temporal queries, and
allows the system to reconstruct any aggregate's state at any point in history.
Background:
Given an EventStore backend
Scenario: First event in an aggregate's history starts at sequence 0
Given an aggregate "player" with no events
When I add 1 event to the aggregate
Then the aggregate should have 1 event
And the first event should have sequence 0
Scenario: Multiple events from a single command receive consecutive sequences
Given an aggregate "player" with no events
When I add 5 events to the aggregate
Then the aggregate should have 5 events
And events should have consecutive sequences starting from 0
Scenario: Concurrent writers are detected via sequence mismatch
Given an aggregate "player" with 3 events
When I try to add an event with sequence 1
Then the operation should fail with a sequence conflict
Scenario: Stale writers cannot overwrite history
Given an aggregate "player" with 3 events
When I try to add an event with sequence 0
Then the operation should fail with a sequence conflict
Running Contract Tests
# Run against SQLite (fast, no containers)
STORAGE_BACKEND=sqlite cargo test --test interfaces --features sqlite
# Run against PostgreSQL (testcontainers)
STORAGE_BACKEND=postgres cargo test --test interfaces --features postgres
# Run against immudb (testcontainers)
STORAGE_BACKEND=immudb cargo test --test interfaces --features immudb
Contract tests ensure you can swap backends without behavior changes. If SQLite passes but Postgres fails, the Postgres implementation has a bug.
Testcontainers
Contract tests use testcontainers to provision real databases:
async fn start_postgres() -> (ContainerAsync<GenericImage>, String) {
let image = GenericImage::new("postgres", "16")
.with_exposed_port(5432.tcp())
.with_wait_for(WaitFor::message_on_stdout(
"database system is ready",
));
let container = image
.with_env_var("POSTGRES_USER", "testuser")
.with_env_var("POSTGRES_PASSWORD", "testpass")
.with_env_var("POSTGRES_DB", "testdb")
.start()
.await
.expect("Failed to start container");
let host_port = container.get_host_port_ipv4(5432).await.unwrap();
let url = format!("postgres://testuser:testpass@localhost:{}/testdb", host_port);
(container, url)
}
Benefits:
- Zero setup — tests start containers automatically
- Isolation — each test gets fresh state
- Realistic — tests run against real databases, not mocks
Acceptance Tests
Test business behavior through the full stack. Written in Gherkin, describing what the system does from a business perspective.
Location: examples/rust/e2e/tests/features/
Gherkin Authoring
Gherkin is business-readable specification, not test code. Describe what the system does and why it matters—never how.
The litmus test: "Will this wording change if the implementation changes?" If yes, abstract to behavior.
Declarative Over Imperative
# Wrong: UI choreography
When I click "Add to Cart"
And I click "Checkout"
And I fill in "Card Number" with "4111..."
# Right: Business intent
When I purchase the items in my cart
Given-When-Then Semantics
| Keyword | Purpose | Example |
|---|---|---|
| Given | Establish context (past state) | Given a player with $500 in their bankroll |
| When | Single triggering action | When the player reserves $200 for the table |
| Then | Verify business outcomes | Then the player's available balance is $300 |
Business Language
| Technical (Avoid) | Business (Prefer) |
|---|---|
| API returns 201 | Order is confirmed |
| Database has record | Customer exists |
| Event is published | Notification is sent |
| State machine transitions | Hand progresses to showdown |
Exception: Framework tests (event stores, buses) use technical vocabulary—it's their domain.
One Scenario, One Behavior
Each scenario tests exactly one thing. Multiple When-Then pairs = multiple scenarios.
Feature Preambles
Open features with context explaining what this capability enables, why it matters, and what breaks if it doesn't work:
Feature: Player fund reservation
Players must reserve funds when joining a table. This ensures:
- Players can cover their buy-in before sitting down
- Funds are locked and cannot be double-spent across tables
Without fund reservation, players could join multiple tables with
the same bankroll, creating settlement disputes.
Error Cases Are First-Class
Don't just test happy paths. Business rules live in constraints:
Scenario: Cannot reserve more than available balance
Given Alice has $500 available
When Alice tries to reserve $600
Then the request fails with "insufficient funds"
And Alice's available balance remains $500
Cross-Domain Scenarios
Show saga/PM translations explicitly without exposing implementation:
Scenario: Order completion triggers fulfillment
Given an order with items:
| sku | quantity |
| WIDGET | 3 |
When the order is completed
Then a fulfillment request is created with:
| sku | quantity |
| WIDGET | 3 |
Shared Feature Files
The same Gherkin scenarios validate all language implementations:
examples/
├── features/ # Shared Gherkin (canonical source)
│ ├── player.feature
│ ├── table.feature
│ └── hand.feature
├── python/features/ # Symlinks to ../features/
├── go/features/ # Symlinks to ../features/
└── rust/e2e/tests/features/ # Symlinks
Running Acceptance Tests
- Rust
- Python
- Go
# Rust uses cucumber-rs
cargo test --package e2e --test acceptance
# Run specific tags
cargo test --package e2e --test acceptance -- --tags @player
# Python uses behave
cd examples/python
behave features/
# Run specific tags
behave features/ --tags=@player
# Go uses godog
cd examples/go
go test -v ./... --godog.tags=@player
Two Execution Modes
Acceptance tests support two backends via trait abstraction:
| Mode | Description | Infrastructure | Use Case |
|---|---|---|---|
| Standalone (default) | In-process RuntimeBuilder | Channel bus + SQLite | Fast local development |
| Direct | Remote gRPC against deployed cluster | NATS + Postgres (or configured) | K8s validation |
Standalone mode uses test infrastructure, not production tooling:
- Channel bus (in-memory) instead of NATS JetStream
- SQLite (in-memory) instead of Postgres
- Single process, no containers required
This trades production fidelity for speed. Use Direct mode to validate against real infrastructure.
# Standalone (default) — channel bus + SQLite, in-process
cargo test --package e2e --test acceptance
# Direct mode — real infrastructure via gRPC
ANGZARR_TEST_MODE=direct \
ANGZARR_ORDER_ENDPOINT=http://localhost:1310 \
cargo test --package e2e --test acceptance
Running Tests
Using just (recommended)
# Unit tests
just test
# Integration tests
just integration
# Acceptance tests
just acceptance
# Contract tests (all backends)
just test-interfaces-all
Direct commands
# Unit tests
cargo test --lib
# Integration tests (in-process, no containers)
cargo test --test standalone_integration --features sqlite
# Contract tests (testcontainers for real backends)
STORAGE_BACKEND=sqlite cargo test --test interface_tests --features sqlite
STORAGE_BACKEND=postgres cargo test --test interface_tests --features postgres
# Acceptance tests
cargo test --package e2e --test acceptance
Anti-Patterns
| Anti-Pattern | Problem | Fix |
|---|---|---|
| UI steps in Gherkin | "click", "fill in", "navigate" | Use business intent |
| Technical assertions | "database has row", "event published" | Use business outcomes |
| Conditional logic | "if valid then X else Y" | Separate scenarios |
| Vague outcomes | "works correctly" | Be specific |
| Hardcoded test data | Magic numbers everywhere | Use meaningful descriptions |
| Skipping TDD | Tests written after code | Write test first, watch it fail |
| Testing mocks | Mock everything | Test real implementations |
Definition of Done
A task is complete when:
- Implementation code exists
- Tests exist and exercise the implementation
- Tests actually run (not just specifications)
- Tests pass (
cargo test,pytest, etc.) - For Gherkin: step definitions implemented and runner passes
"Tests pass" means running the actual test command, not just writing test files.