Testing Strategy
Nothing is “done” until tests prove it works. Writing code without runnable tests is incomplete work.
TDD Workflow
Section titled “TDD Workflow”Test-Driven Development is mandatory. Follow the Red-Green-Refactor cycle:
flowchart LR
R[🔴 Red] --> G[🟢 Green] --> RF[🔵 Refactor] --> R
| Phase | Action |
|---|---|
| Red | Write a failing test first. Verify it fails for the right reason. Ensure test isolation. |
| Green | Write minimal code to pass. No extras, no premature optimization. |
| Refactor | Clean up while keeping tests green. Apply SOLID principles. Remove duplication. |
Critical: If a test fails, fix the issue and create a new commit. Never amend commits that have been pushed.
Four Levels of Testing
Section titled “Four Levels of Testing”flowchart TB
subgraph Acceptance["Acceptance Tests"]
A1[Business behavior]
A2[Full stack via Gherkin]
end
subgraph Contract["Contract Tests"]
C1[Interface compliance]
C2[Backend interchangeability]
end
subgraph Integration["Integration Tests"]
I1[Framework internals]
I2[In-process RuntimeBuilder]
end
subgraph Unit["Unit Tests"]
U1[Pure functions]
U2[No I/O]
end
Acceptance --> Contract --> Integration --> Unit
| Level | Scope | Speed | Infrastructure |
|---|---|---|---|
| Unit | Single function/module | Fast (ms) | None |
| Integration | Framework plumbing | Fast (ms) | In-process (RuntimeBuilder, SQLite, channels) |
| Contract | Interface compliance | Medium (s) | Testcontainers |
| Acceptance | Business behavior | Medium (s) | In-process channel + SQLite or full stack (direct) |
Why Cucumber/Gherkin
Section titled “Why Cucumber/Gherkin”Angzarr uses Cucumber/Gherkin extensively—not just for acceptance tests, but also for contract tests and integration tests. The primary motivation is readability.
Ease of Reading
Section titled “Ease of Reading”Gherkin specifications are easier to read than programmatic tests:
# Gherkin: Intent is immediately clearScenario: Events persist with correct sequence numbers Given an empty event store When I append 3 events to aggregate "player-123" Then the events have sequences 0, 1, 2 And querying from sequence 1 returns 2 events// Programmatic: Requires understanding test framework idioms#[tokio::test]async fn test_events_persist_with_correct_sequence_numbers() { let store = setup_store().await; let root = uuid::Uuid::new_v4();
for i in 0..3 { store.append(&root, make_event(i)).await.unwrap(); }
let events = store.get_from(&root, 1).await.unwrap(); assert_eq!(events.len(), 2); assert_eq!(events[0].sequence, 1);}Both test the same thing, but the Gherkin version:
- Documents behavior in plain English
- Serves as living specification
- Is reviewable by non-developers
- Makes test coverage gaps obvious
Where We Use Cucumber
Section titled “Where We Use Cucumber”| Component | Test Type | Harness | Why |
|---|---|---|---|
Core (src/, tests/) | Contract, Integration | Rust cucumber-rs | Readable specs for framework behavior |
Clients (client/{lang}/) | Contract | Unified Rust gRPC harness | One source of truth across 6 languages |
Examples (examples/{lang}/) | Acceptance | Per-language harnesses | Demonstrative for developers |
Core Testing with Cucumber
Section titled “Core Testing with Cucumber”Core framework tests use Gherkin for readability, not cross-language consistency:
tests/├── interfaces/│ ├── features/│ │ ├── event_store.feature # EventStore contract│ │ ├── snapshot_store.feature # SnapshotStore contract│ │ ├── position_store.feature # PositionStore contract│ │ ├── event_bus.feature # EventBus contract│ │ └── dlq.feature # DLQ behavior│ └── steps/ # Rust step definitions├── client/│ ├── features/│ │ ├── aggregate-client.feature # Client SDK contracts│ │ ├── command-builder.feature│ │ └── query-client.feature│ └── (Rust harness calls clients via gRPC)└── acceptance/ └── features/ └── end_to_end.feature # Full stack scenariosClient Testing Architecture
Section titled “Client Testing Architecture”Client libraries are tested with a unified Rust gRPC harness:
flowchart TB
H["Rust Gherkin Harness (cucumber-rs)<br/>- Step definitions: tests/client/<br/>- Feature files: client/features/*.feature"]
H -->|gRPC| Py[Python Client]
H -->|gRPC| Go[Go Client]
H -->|gRPC| Rs[Rust Client]
H -->|gRPC| Ja[Java Client]
H -->|gRPC| Cs[C# Client]
H -->|gRPC| Cpp[C++ Client]
Why unified?
- One source of truth for SDK contracts
- Same tests validate all 6 language implementations
- Tests actual gRPC protocol, not internal APIs
Client Contract Examples
Section titled “Client Contract Examples”# docs:start:aggregate_client_contractFeature: AggregateClient - Command Execution The AggregateClient sends commands to aggregates for processing. Commands are validated, processed, and result in events being persisted. Supports async (fire-and-forget), sync, and speculative modes.
Without command execution, the system cannot accept user actions or change aggregate state.# docs:end:aggregate_client_contract
# docs:start:client_command Scenario: Execute command on new aggregate Given a new aggregate root in domain "orders" When I execute a "CreateOrder" command with data "customer-123" Then the command should succeed And the response should contain 1 event And the event should have type "OrderCreated"
Scenario: Execute command on existing aggregate Given an aggregate "orders" with root "order-001" at sequence 3 When I execute a "AddItem" command at sequence 3 Then the command should succeed And the response should contain events starting at sequence 3 # docs:end:client_command
# docs:start:client_concurrency Scenario: Command at wrong sequence fails with precondition error Given an aggregate "orders" with root "order-002" at sequence 5 When I execute a command at sequence 3 Then the command should fail with precondition error And the error should indicate sequence mismatch
Scenario: Concurrent writes are detected Given an aggregate "orders" with root "order-003" at sequence 0 When two commands are sent concurrently at sequence 0 Then one should succeed And one should fail with precondition error # docs:end:client_concurrency# docs:start:command_builder_contractFeature: CommandBuilder - Fluent Command Construction The CommandBuilder provides a fluent API for constructing commands. It handles serialization, correlation IDs, sequence numbers, and type URLs while providing compile-time and runtime validation.
The builder pattern enables both OO-style client usage and can be adapted for router-based implementations.# docs:end:command_builder_contract
Scenario: Build command with all required fields When I build a command for domain "orders" root "order-001" And I set the command type to "CreateOrder" And I set the command payload Then the built command should have domain "orders" And the built command should have root "order-001" And the built command should have type URL containing "CreateOrder"
Scenario: Build with explicit correlation ID When I build a command for domain "orders" And I set correlation ID to "trace-123" And I set the command type and payload Then the built command should have correlation ID "trace-123"
Scenario: Builder methods can be chained When I build a command using fluent chaining: """ client.command("orders", root) .with_correlation_id("trace-456") .with_sequence(3) .with_command("CreateOrder", payload) .build() """ Then the build should succeed And all chained values should be preservedjust test-client python # Test Python client via Rust harnessjust test-client go # Test Go client via Rust harnessjust test-clients # Test all clientsExample Testing Architecture
Section titled “Example Testing Architecture”Example implementations use per-language Gherkin harnesses:
examples/features/unit/*.feature (shared specifications) │ ├── Python: behave + examples/python/features/steps/ ├── Go: godog + examples/go/tests/steps/ ├── Rust: cucumber-rs + examples/rust/tests/ ├── Java: cucumber-junit5 + examples/java/tests/ ├── C#: SpecFlow + examples/csharp/Tests/Steps/ └── C++: cucumber-cpp + examples/cpp/tests/Why per-language?
- Demonstrative for non-polyglot developers
- Developers see Gherkin AND step definitions in their language
- Educational code they can learn from and copy
just examples python test # behavejust examples go test # godogjust examples rust test # cucumber-rsUnit Tests
Section titled “Unit Tests”No external dependencies. Tests interact only with the system under test—no I/O, no concurrency, no infrastructure.
Business Logic Only
Section titled “Business Logic Only”Test business logic directly. No mocks. No frameworks. No infrastructure.
Aggregates are classes: instantiate, seed state, call a @handles method, assert on the returned event. Appliers are equally testable — pass a fresh state, call the method, assert on the mutation.
- Pass in state structs directly
- Assert on returned events
- No database connections, no message buses, no HTTP clients
If you’re writing mocks, you’re testing the wrong thing. Business logic should be deterministic methods that take data in and return data out.
Direct Method Testing
Section titled “Direct Method Testing”def test_deposit_increases_bankroll(): agg = Player() agg.state.registered = True agg.state.bankroll = 1000
event = agg.handle_deposit_funds(DepositFunds(amount=500))
assert event.new_bankroll == 1500
def test_deposit_rejected_when_not_registered(): agg = Player() agg.state.registered = False
with pytest.raises(CommandRejectedError): agg.handle_deposit_funds(DepositFunds(amount=500))Why this works:
@handlesmethods are ordinary methods — call them directly- No mocking required: pass state directly, assert on returned events
- Proto serialization tested separately from business logic
Test Naming
Section titled “Test Naming”Use the pattern: test_<action>_<condition>_<expected_result>
| Language | Convention | Example |
|---|---|---|
| Python | snake_case | test_deposit_with_zero_amount_raises_error |
| Rust | snake_case | test_deposit_with_zero_amount_raises_error |
| Go | CamelCase | TestDepositWithZeroAmountRaisesError |
| Java/C# | camelCase | depositWithZeroAmountRaisesError |
Prioritize readability over rigid format. Tests are documentation.
Event Sourcing: The Any Boundary
Section titled “Event Sourcing: The Any Boundary”Events cross a serialization boundary between business logic and the framework:
flowchart TB
subgraph BL["Business Logic"]
C["compute(cmd, state) → raw event"]
BS["build_state(state, events)"]
AE["_apply_event(state, event_any)"]
MS[mutate state]
end
subgraph FW["Framework"]
AP["Any.Pack(event)"]
EB["EventBook.pages[].event<br/>(Any-wrapped)"]
EX[extract events from pages]
UP["event_any.Unpack(typed_event)"]
end
C --> AP
AP -->|persist to EventBook| EB
EB --> EX
EX --> BS
BS --> AE
AE --> UP
UP --> MS
The framework stores events as opaque Any blobs—it doesn’t know business types. Business logic must decode the Any because only it knows PlayerRegistered, FundsDeposited, etc.
Full event sourcing test cycle:
def test_deposit_full_cycle(): # 1. Start with state state = PlayerState(bankroll=100) cmd = DepositFunds(amount=50)
# 2. compute() produces raw event event = compute(cmd, state)
# 3. Wrap in Any (what framework does for persistence) event_any = Any() event_any.Pack(event, type_url_prefix="type.googleapis.com/")
# 4. build_state applies Any-wrapped events → new state new_state = build_state(state, [event_any])
assert new_state.bankroll == 150Tests mimic the production boundary exactly—no special test-only interfaces.
Integration Tests
Section titled “Integration Tests”Test Angzarr framework internals using synthetic aggregates (EchoAggregate, MultiEventAggregate). Prove the plumbing works—not business logic.
Infrastructure: In-process only. Uses RuntimeBuilder with SQLite and channel bus. No containers.
What they cover:
- Event persistence and sequence numbering
- IPC event bus (named pipes, domain filtering)
- gRPC over UDS transport
- Channel bus pub/sub delivery
- Saga activation and cross-domain command routing
- Snapshot/recovery, lossy bus resilience
Gherkin Example
Section titled “Gherkin Example”Feature: EventBus interface The EventBus distributes committed events to interested subscribers. After an aggregate persists events, the bus broadcasts them to sagas, projectors, and process managers that need to react.
Background: Given an EventBus backend
Scenario: Handlers only receive events from their subscribed domain Given the player-projector subscribes only to the player domain When events are published to player and table domains Then the player-projector receives only player events And never sees table events which are filtered out by the bus
Scenario: Cross-domain handlers can subscribe to multiple domains Given the output-projector subscribed to player and table domains When events are published to player, table, and hand domains Then the output-projector receives player events because it subscribed And the output-projector receives table events because it subscribed And the output-projector does NOT receive hand events because it did not subscribe
Scenario: Events arrive in sequence order from a single publisher Given a single-threaded hand aggregate publishing events And a projector subscribed to hand When events with sequences 0, 1, 2, 3, 4 are published in order Then the projector receives them in sequence order: 0, 1, 2, 3, 4Contract Tests
Section titled “Contract Tests”Verify all storage implementations behave identically. Same interface, interchangeable backends.
Infrastructure: Testcontainers exclusively. This is the only level that uses containers—real Postgres, Redis, NATS, immudb.
What contracts verify:
EventStore: add, get, get_from, get_from_to, list_roots, sequence numberingSnapshotStore: save, load, deletePositionStore: get, set, checkpoint semantics
Location: tests/interfaces/
Gherkin Contract Specifications
Section titled “Gherkin Contract Specifications”Contract tests are written in Gherkin for readability. The same feature files run against every backend:
Feature: EventStore interface The EventStore is the source of truth for all state changes in the system. Every aggregate's current state is derived by replaying its events. This immutability provides a complete audit trail, enables temporal queries, and allows the system to reconstruct any aggregate's state at any point in history.
Background: Given an EventStore backend
Scenario: First event in an aggregate's history starts at sequence 0 Given an aggregate "player" with no events When I add 1 event to the aggregate Then the aggregate should have 1 event And the first event should have sequence 0
Scenario: Multiple events from a single command receive consecutive sequences Given an aggregate "player" with no events When I add 5 events to the aggregate Then the aggregate should have 5 events And events should have consecutive sequences starting from 0
Scenario: Concurrent writers are detected via sequence mismatch Given an aggregate "player" with 3 events When I try to add an event with sequence 1 Then the operation should fail with a sequence conflict
Scenario: Stale writers cannot overwrite history Given an aggregate "player" with 3 events When I try to add an event with sequence 0 Then the operation should fail with a sequence conflictRunning Contract Tests
Section titled “Running Contract Tests”# Run against SQLite (fast, no containers)STORAGE_BACKEND=sqlite cargo test --test interfaces --features sqlite
# Run against PostgreSQL (testcontainers)STORAGE_BACKEND=postgres cargo test --test interfaces --features postgres
# Run against immudb (testcontainers)STORAGE_BACKEND=immudb cargo test --test interfaces --features immudbContract tests ensure you can swap backends without behavior changes. If SQLite passes but Postgres fails, the Postgres implementation has a bug.
Testcontainers
Section titled “Testcontainers”Contract tests use testcontainers to provision real databases:
async fn start_postgres() -> (ContainerAsync<GenericImage>, String) { let image = GenericImage::new("postgres", "16") .with_exposed_port(5432.tcp()) .with_wait_for(WaitFor::message_on_stdout( "database system is ready", ));
let container = image .with_env_var("POSTGRES_USER", "testuser") .with_env_var("POSTGRES_PASSWORD", "testpass") .with_env_var("POSTGRES_DB", "testdb") .start() .await .expect("Failed to start container");
let host_port = container.get_host_port_ipv4(5432).await.unwrap(); let url = format!("postgres://testuser:testpass@localhost:{}/testdb", host_port);
(container, url)}Benefits:
- Zero setup — tests start containers automatically
- Isolation — each test gets fresh state
- Realistic — tests run against real databases, not mocks
Acceptance Tests
Section titled “Acceptance Tests”Test business behavior through the full stack. Written in Gherkin, describing what the system does from a business perspective.
Location: examples/rust/e2e/tests/features/
Gherkin Authoring
Section titled “Gherkin Authoring”Gherkin is business-readable specification, not test code. Describe what the system does and why it matters—never how.
The litmus test: “Will this wording change if the implementation changes?” If yes, abstract to behavior.
Declarative Over Imperative
Section titled “Declarative Over Imperative”# Wrong: UI choreographyWhen I click "Add to Cart"And I click "Checkout"And I fill in "Card Number" with "4111..."
# Right: Business intentWhen I purchase the items in my cartGiven-When-Then Semantics
Section titled “Given-When-Then Semantics”| Keyword | Purpose | Example |
|---|---|---|
| Given | Establish context (past state) | Given a player with $500 in their bankroll |
| When | Single triggering action | When the player reserves $200 for the table |
| Then | Verify business outcomes | Then the player's available balance is $300 |
Business Language
Section titled “Business Language”| Technical (Avoid) | Business (Prefer) |
|---|---|
| API returns 201 | Order is confirmed |
| Database has record | Customer exists |
| Event is published | Notification is sent |
| State machine transitions | Hand progresses to showdown |
Exception: Framework tests (event stores, buses) use technical vocabulary—it’s their domain.
One Scenario, One Behavior
Section titled “One Scenario, One Behavior”Each scenario tests exactly one thing. Multiple When-Then pairs = multiple scenarios.
Feature Preambles
Section titled “Feature Preambles”Open features with context explaining what this capability enables, why it matters, and what breaks if it doesn’t work:
Feature: Player fund reservation
Players must reserve funds when joining a table. This ensures: - Players can cover their buy-in before sitting down - Funds are locked and cannot be double-spent across tables
Without fund reservation, players could join multiple tables with the same bankroll, creating settlement disputes.Error Cases Are First-Class
Section titled “Error Cases Are First-Class”Don’t just test happy paths. Business rules live in constraints:
Scenario: Cannot reserve more than available balance Given Alice has $500 available When Alice tries to reserve $600 Then the request fails with "insufficient funds" And Alice's available balance remains $500Cross-Domain Scenarios
Section titled “Cross-Domain Scenarios”Show saga/PM translations explicitly without exposing implementation:
Scenario: Order completion triggers fulfillment Given an order with items: | sku | quantity | | WIDGET | 3 | When the order is completed Then a fulfillment request is created with: | sku | quantity | | WIDGET | 3 |Shared Feature Files
Section titled “Shared Feature Files”The same Gherkin scenarios validate all language implementations. angzarr-project/features/acceptance/ is the canonical source; each example repo pulls it in via the project submodule.
angzarr-project/features/acceptance/├── poker_game.feature # End-to-end poker flow└── sync_modes.feature # Commit vs cascade semanticsEach of the six language repos (Python, Rust, Go, Java, C#, C++) implements 169 step definitions against the same features — same behavior, same assertions, different runtimes.
Running Acceptance Tests
Section titled “Running Acceptance Tests”# Python uses behavecd examples/pythonbehave features/
# Run specific tagsbehave features/ --tags=@playerTwo Execution Modes
Section titled “Two Execution Modes”Acceptance tests support two backends through a CommandClient gRPC abstraction — the same step definitions drive either mode without change:
| Mode | Description | Infrastructure | Use Case |
|---|---|---|---|
| In-process (default) | In-process RuntimeBuilder | Channel bus + SQLite | Fast local development |
| Direct | Remote gRPC against deployed cluster (chart 0.5.1) | NATS + Postgres | K8s validation, CI |
In-process mode uses test infrastructure, not production tooling:
- Channel bus (in-memory) instead of NATS JetStream
- SQLite (in-memory) instead of Postgres
- Single process, no containers required
This trades production fidelity for speed. CI runs Direct mode against Kind with port-forwards on 1310/1311/1312 (player/table/hand).
# In-process (default) — channel bus + SQLitejust test-unit
# Direct mode — real infrastructure via gRPC against Kindjust test-acceptancetest-unit drives step definitions against RuntimeBuilder in-process; test-acceptance drives the same steps through CommandClient gRPC calls to deployed aggregates. All six language repos are green against chart 0.5.1.
Running Tests
Section titled “Running Tests”Using just (recommended)
Section titled “Using just (recommended)”# Unit testsjust test
# Integration testsjust integration
# Acceptance testsjust acceptance
# Contract tests (all backends)just test-interfaces-allDirect commands
Section titled “Direct commands”# Unit testscargo test --lib
# Contract tests (testcontainers for real backends)STORAGE_BACKEND=sqlite cargo test --test interface_tests --features sqliteSTORAGE_BACKEND=postgres cargo test --test interface_tests --features postgres
# Acceptance testscargo test --package e2e --test acceptanceAnti-Patterns
Section titled “Anti-Patterns”| Anti-Pattern | Problem | Fix |
|---|---|---|
| UI steps in Gherkin | ”click”, “fill in”, “navigate” | Use business intent |
| Technical assertions | ”database has row”, “event published” | Use business outcomes |
| Conditional logic | ”if valid then X else Y” | Separate scenarios |
| Vague outcomes | ”works correctly” | Be specific |
| Hardcoded test data | Magic numbers everywhere | Use meaningful descriptions |
| Skipping TDD | Tests written after code | Write test first, watch it fail |
| Testing mocks | Mock everything | Test real implementations |
Definition of Done
Section titled “Definition of Done”A task is complete when:
- Implementation code exists
- Tests exist and exercise the implementation
- Tests actually run (not just specifications)
- Tests pass (
cargo test,pytest, etc.) - For Gherkin: step definitions implemented and runner passes
“Tests pass” means running the actual test command, not just writing test files.