Skip to main content

Graceful Failure

The player's bet was accepted. The pot updated. Then the hand discovered they'd already folded. Now what?


The Problem

Distributed systems fail in the middle. A saga issues a command, the target aggregate rejects it, and now the source needs to know—and respond.

Traditional solutions—two-phase commit, distributed transactions—are complex, slow, and often unavailable across service boundaries. Event sourcing offers a different approach: let failures happen, record them, and compensate.


How Compensation Works

When a saga issues a command that gets rejected:

illustrative - compensation flow
1. Hand emits BetPlaced event
2. Saga (Hand→Table) receives event, issues DeductFromPot → Table
3. Table rejects: "Player already folded"
4. Framework sends RejectionNotification directly to Hand aggregate
5. Hand's @rejected handler emits compensation event: BetReverted

The audit trail shows exactly what happened: the attempt, the rejection, and the recovery.


The Flow

The rejection notification bypasses the saga entirely. The framework routes it directly to the source aggregate using the return address stamped on the original command. The saga is stateless—it doesn't need to know about rejections. The source aggregate decides how to compensate.


Handling Rejections

Register handlers for specific rejection scenarios:

def handle_join_rejected(
notification: types.Notification,
state: PlayerState,
) -> types.EventBook | None:
"""Handle JoinTable rejection by releasing reserved funds.

Called when the JoinTable command (issued by saga-player-table after
FundsReserved) is rejected by the Table aggregate.
"""
from google.protobuf.any_pb2 import Any

# Extract rejection details from the notification payload
rejection = types.RejectionNotification()
if notification.payload:
notification.payload.Unpack(rejection)

# Extract table_root from the rejected command
table_root = b""
if rejection.rejected_command and rejection.rejected_command.cover:
if rejection.rejected_command.cover.root:
table_root = rejection.rejected_command.cover.root.value

# Release the funds that were reserved for this table
table_key = table_root.hex()
reserved_amount = state.table_reservations.get(table_key, 0)
new_reserved = state.reserved_funds - reserved_amount
new_available = state.bankroll - new_reserved

event = player.FundsReleased(
amount=poker_types.Currency(amount=reserved_amount, currency_code="CHIPS"),
table_root=table_root,
new_available_balance=poker_types.Currency(
amount=new_available, currency_code="CHIPS"
),
new_reserved_balance=poker_types.Currency(
amount=new_reserved, currency_code="CHIPS"
),
released_at=now(),
)

# Pack the event
event_any = Any()
event_any.Pack(event, type_url_prefix="type.googleapis.com/")

# Build the EventBook using the notification's cover for routing
return types.EventBook(
cover=notification.cover,
pages=[types.EventPage(header=types.PageHeader(sequence=0), event=event_any)],
)


The framework routes rejections to the appropriate handler based on the rejected command's domain and type.


Compensation in Poker

Different failures require different responses:

ScenarioCompensation
Player disconnects mid-actionAuto-fold, return to action queue
Insufficient chips for blindSit out, notify table
Invalid bet amountReject action, prompt retry
Table closed during handRefund all pots, end hand
Timer expiredAuto-check or auto-fold

The aggregate decides the business response. The framework ensures the notification arrives.


RevocationResponse Options

When handling a rejection, you can specify additional actions:

Illustrative Example

The following shows the RevocationResponse pattern. Your handlers will use your domain's specific events and rejection reasons.

illustrative - RevocationResponse options
@rejected(domain="player", command="ReserveFunds")
def handle_reserve_failed(self, notification: Notification):
rejection = RejectionNotification()
notification.payload.Unpack(rejection)

if rejection.rejection_reason == "insufficient_balance":
# Return event directly—framework auto-applies it
return PlayerSatOut(reason="insufficient_funds")
else:
# Delegate to framework for DLQ/escalation
return delegate_to_framework(
reason=rejection.rejection_reason,
send_to_dead_letter=True,
escalate=True, # Alert floor manager
)
FlagEffect
emit_system_revocationEmit SagaCompensationFailed event
send_to_dead_letter_queueRoute to DLQ for manual review
escalateTrigger configured webhook (floor manager alert)
abortStop saga chain, propagate error

Multi-Step Compensation

Complex workflows may require compensating multiple steps:

Illustrative Example

The following shows multi-step compensation patterns. Your implementation will define domain-specific events for each compensating action.

illustrative - multi-step compensation
# Player tried to join table but verification failed
@rejected(domain="verification", command="VerifyPlayer")
def handle_verification_failed(self, notification: Notification):
# Already reserved their seat and took their buy-in
# Need to undo both—return multiple events as a tuple
return (
SeatReleased(seat=self.state.pending_seat),
BuyInRefunded(player_id=self.state.pending_player, amount=self.state.pending_buyin),
JoinRejected(player_id=self.state.pending_player, reason="verification_failed"),
)

Each compensation event is recorded. The audit trail shows the full sequence: attempt, failure, recovery.


Why This Matters

Regulated industries require demonstrable fairness:

  • Every player action must be recorded
  • Every rejection must be explained
  • Every compensation must be traceable

When a regulator asks "why did this player lose their bet?", the event history shows:

  1. The bet was placed
  2. The deduction was attempted
  3. The deduction was rejected (reason: player had folded)
  4. The bet was reverted
  5. The player was notified

No silent failures. No unexplained state changes.


Revocation vs Compensate

The framework provides two mechanisms for undoing events:

MechanismOriginal EventClient CodeUse Case
RevocationHidden (becomes NoOp)None neededFull undo, clean state
CompensateVisibleHandler implements inversePartial undo, business logic

Revocation (Framework-Only)

Revocation hides the original event at read time. No client code required.

Revocation flow
1. Framework writes Revocation { sequences: [5, 6], reason: "timeout" }
2. At read time, events 5 and 6 become NoOp
3. Business logic never sees the original events

Use revocation when:

  • Full undo is needed, "never happened" semantics
  • No business logic required for the undo
  • Events came from a failed cascade or timeout

Compensate (Client-Implemented)

Compensate keeps the original event visible and routes to a client handler. The handler emits inverse events.

Compensate flow
1. Framework writes Compensate { sequences: [5], reason: "order_cancelled" }
2. Framework routes to client's compensation handler
3. Handler receives original event, emits inverse (e.g., InventoryReleased)
4. Both original and compensation events visible in stream

Use compensate when:

  • Business logic must decide how to undo
  • Partial undo based on context
  • Explicit audit trail required (both events visible)
  • Third-party notifications or side effects need reversal

Example: Compensation Handler

illustrative - compensation handler registration
@compensate("InventoryReserved")
def compensate_reservation(self, original_event, reason):
# Original event remains visible
# Emit inverse event
return InventoryReleased(
sku=original_event.sku,
qty=original_event.qty,
reason=reason,
)

Read-Time Behavior

Event TypeAt Read Time
Revocation markerNoOp
Revoked eventNoOp
Compensate markerNoOp
Compensated eventVisible (key difference)

See Also