Distributed tracing
The third telemetry pillar the observability page named but didn’t have its own page for — and one of the three sources Netflix fuses (eBPF flow logs + IPC metrics + distributed traces) into its dependency graph. From the OpenTelemetry docs, the canonical vendor-neutral reference.
What it is
- Span — “a unit of work or operation”; the building block. Carries a name, timing, parent/child links, and attributes (key-value metadata: user ID, HTTP route).
- Trace — “the path of a request through your application”: all spans sharing one trace ID, assembled into a parent/child hierarchy that maps the request across services.
- Context propagation — the core enabling mechanism: passing trace/span IDs across process, service, and datacenter boundaries so spans emitted anywhere can be correlated into one trace.
How it complements the other signals
The three observability signals divide labor: metrics (prometheus) answer “is something wrong, and how much?” (cheap, aggregate); logs answer “what exactly happened here?” (detailed, local); traces answer “where, across the whole request path, did the latency/error occur?” (cross-service, causal). Traces are “structured logs with context, correlation, and hierarchy baked in,” which is why they’re the signal that reveals latency sources and service dependencies that metrics and logs alone can’t.
Why it matters to the spoke
Distributed tracing is the per-request view of the same dependency structure the service-topology shows in aggregate — a topology graph is, in part, traces summed over time. It is therefore load-bearing for the spoke’s “seams, not components” thesis: a trace is the seam made visible, the literal record of a request crossing the boundaries between services where the hard problems live. It’s produced through opentelemetry instrumentation (the off-the-shelf layer), feeds the service-topology that site-reliability-engineering reads and aiops agents reason over, and supplies the latency SLIs that SLOs are defined on.
Related
observability · service-topology · opentelemetry · prometheus · service-level-objectives · netflix-service-topology · platform-ops