Skip to main content

Observability

Angzarr provides full observability through OpenTelemetry, exporting traces, metrics, and logs via OTLP.


Architecture


Feature Flag

Enable OpenTelemetry with the otel feature:

cargo build --features otel

Without this flag, only console logging via tracing is available.


Environment Variables

VariableDescriptionDefault
OTEL_SERVICE_NAMEService name in traces/metricsangzarr
OTEL_EXPORTER_OTLP_ENDPOINTCollector endpointhttp://localhost:4317
OTEL_RESOURCE_ATTRIBUTESAdditional resource attributes-
RUST_LOGLog level filterinfo
export OTEL_SERVICE_NAME=angzarr-order
export OTEL_EXPORTER_OTLP_ENDPOINT=http://otel-collector:4317
export OTEL_RESOURCE_ATTRIBUTES=deployment.environment=prod,service.version=1.0.0

Metrics

Command Pipeline

MetricTypeLabelsDescription
angzarr.command.durationHistogramdomain, outcomeCommand handling latency
angzarr.command.totalCounterdomain, outcomeTotal commands processed

Event Bus

MetricTypeLabelsDescription
angzarr.bus.publish.durationHistogrambus_type, domainPublish operation latency
angzarr.bus.publish.totalCounterbus_type, domainTotal publish operations

Orchestration

MetricTypeLabelsDescription
angzarr.saga.durationHistogramnameSaga execution time
angzarr.saga.retry.totalCounternameSaga retry attempts
angzarr.saga.compensation.totalCounternameCompensations triggered
angzarr.pm.durationHistogramnameProcess manager execution time
angzarr.projector.durationHistogramnameProjector handling time

Labels

LabelValues
domainAggregate domain (e.g., player, table)
outcomesuccess, rejected, error
bus_typeamqp, kafka, channel, ipc
componentaggregate, saga, projector, process_manager
nameComponent instance name

Traces

Angzarr propagates W3C TraceContext headers across gRPC boundaries, enabling distributed tracing through:

  • Client → Coordinator → Business Logic
  • Event Bus → Projector/Saga → Business Logic
  • Saga → Target Aggregate

Each span includes:

  • domain attribute
  • correlation_id (if present)
  • Event/command type URLs

Kubernetes Deployment

Deploy Observability Stack

# Add Helm repos
helm repo add grafana https://grafana.github.io/helm-charts
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo add open-telemetry https://open-telemetry.github.io/opentelemetry-helm-charts
helm repo update

# Deploy observability stack
helm install angzarr-otel ./deploy/helm/observability -n monitoring --create-namespace

This deploys:

  • OTel Collector — OTLP receiver on port 4317 (gRPC) and 4318 (HTTP)
  • Tempo — Distributed tracing backend
  • Prometheus — Metrics via remote write
  • Loki — Log aggregation
  • Grafana — Visualization (NodePort 30300)

Enable OTel on Angzarr

helm install angzarr ./deploy/helm/angzarr \
-f values-local.yaml \
-f values-observability.yaml \
-n angzarr

Grafana Dashboards

Pre-built dashboards are deployed automatically:

  • Command Pipeline — Throughput, latency percentiles, error rates
  • Event Bus — Publish throughput, latency distribution
  • Orchestration — Saga execution, retry rates, compensations
  • Topology — Live system topology graph

Alerting Examples

groups:
- name: angzarr
rules:
- alert: HighCommandLatency
expr: histogram_quantile(0.99, rate(angzarr_command_duration_bucket[5m])) > 1
for: 5m
labels:
severity: warning
annotations:
summary: "High command latency on {{ $labels.domain }}"

- alert: SagaCompensationSpike
expr: rate(angzarr_saga_compensation_total[5m]) > 0.1
for: 2m
labels:
severity: critical
annotations:
summary: "Saga compensations increasing"

Next Steps