Observability
Angzarr provides full observability through OpenTelemetry, exporting traces, metrics, and logs via OTLP.
Architecture
Section titled “Architecture”flowchart LR
subgraph Sidecars[Angzarr Sidecars]
Coord[Coordinator]
Proj[Projector]
Saga[Saga]
end
subgraph Collector[OTel Collector]
OTLP[OTLP Receiver]
Proc[Processors:<br/>batch, memory_limit]
end
subgraph Backends
Tempo[Tempo<br/>traces]
Prom[Prometheus]
Loki[Loki<br/>logs]
end
Grafana[Grafana<br/>Dashboards<br/>Trace viewer<br/>Log explorer]
Sidecars --> Collector --> Backends --> Grafana
Feature Flag
Section titled “Feature Flag”Enable OpenTelemetry with the otel feature:
cargo build --features otelWithout this flag, only console logging via tracing is available.
Environment Variables
Section titled “Environment Variables”| Variable | Description | Default |
|---|---|---|
OTEL_SERVICE_NAME | Service name in traces/metrics | angzarr |
OTEL_EXPORTER_OTLP_ENDPOINT | Collector endpoint | http://localhost:4317 |
OTEL_RESOURCE_ATTRIBUTES | Additional resource attributes | - |
RUST_LOG | Log level filter | info |
export OTEL_SERVICE_NAME=angzarr-orderexport OTEL_EXPORTER_OTLP_ENDPOINT=http://otel-collector:4317export OTEL_RESOURCE_ATTRIBUTES=deployment.environment=prod,service.version=1.0.0Metrics
Section titled “Metrics”Command Pipeline
Section titled “Command Pipeline”| Metric | Type | Labels | Description |
|---|---|---|---|
angzarr.command.duration | Histogram | domain, outcome | Command handling latency |
angzarr.command.total | Counter | domain, outcome | Total commands processed |
Event Bus
Section titled “Event Bus”| Metric | Type | Labels | Description |
|---|---|---|---|
angzarr.bus.publish.duration | Histogram | bus_type, domain | Publish operation latency |
angzarr.bus.publish.total | Counter | bus_type, domain | Total publish operations |
Orchestration
Section titled “Orchestration”| Metric | Type | Labels | Description |
|---|---|---|---|
angzarr.saga.duration | Histogram | name | Saga execution time |
angzarr.saga.retry.total | Counter | name | Saga retry attempts |
angzarr.saga.compensation.total | Counter | name | Compensations triggered |
angzarr.pm.duration | Histogram | name | Process manager execution time |
angzarr.projector.duration | Histogram | name | Projector handling time |
Labels
Section titled “Labels”| Label | Values |
|---|---|
domain | Aggregate domain (e.g., player, table) |
outcome | success, rejected, error |
bus_type | amqp, kafka, channel, ipc |
component | aggregate, saga, projector, process_manager |
name | Component instance name |
Traces
Section titled “Traces”Angzarr propagates W3C TraceContext headers across gRPC boundaries, enabling distributed tracing through:
- Client → Coordinator → Business Logic
- Event Bus → Projector/Saga → Business Logic
- Saga → Target Aggregate
Each span includes:
domainattributecorrelation_id(if present)- Event/command type URLs
Kubernetes Deployment
Section titled “Kubernetes Deployment”Deploy Observability Stack
Section titled “Deploy Observability Stack”# Add Helm reposhelm repo add grafana https://grafana.github.io/helm-chartshelm repo add prometheus-community https://prometheus-community.github.io/helm-chartshelm repo add open-telemetry https://open-telemetry.github.io/opentelemetry-helm-chartshelm repo update
# Deploy observability stackhelm install angzarr-otel ./deploy/helm/observability -n monitoring --create-namespaceThis deploys:
- OTel Collector — OTLP receiver on port 4317 (gRPC) and 4318 (HTTP)
- Tempo — Distributed tracing backend
- Prometheus — Metrics via remote write
- Loki — Log aggregation
- Grafana — Visualization (NodePort 30300)
Enable OTel on Angzarr
Section titled “Enable OTel on Angzarr”helm install angzarr ./deploy/helm/angzarr \ -f values-local.yaml \ -f values-observability.yaml \ -n angzarrGrafana Dashboards
Section titled “Grafana Dashboards”Pre-built dashboards are deployed automatically:
- Command Pipeline — Throughput, latency percentiles, error rates
- Event Bus — Publish throughput, latency distribution
- Orchestration — Saga execution, retry rates, compensations
Alerting Examples
Section titled “Alerting Examples”groups: - name: angzarr rules: - alert: HighCommandLatency expr: histogram_quantile(0.99, rate(angzarr_command_duration_bucket[5m])) > 1 for: 5m labels: severity: warning annotations: summary: "High command latency on {{ $labels.domain }}"
- alert: SagaCompensationSpike expr: rate(angzarr_saga_compensation_total[5m]) > 0.1 for: 2m labels: severity: critical annotations: summary: "Saga compensations increasing"Next Steps
Section titled “Next Steps”- Infrastructure — Helm chart deployment
- Testing — Integration tests with observability