Spokes.wiki Search Graph Growth About

platform-ops-wiki

Tech Article source ↗ source url updated Fri Jun 05 2026 00:00:00 GMT+0000 (Coordinated Universal Time)

Netflix Service Topology: mapping thousands of microservices in real time

InfoQ news piece (2026-06) on Service Topology, an internal Netflix system that builds and maintains a live dependency graph of thousands of microservices in near real-time, so engineers can see how services connect and troubleshoot distributed failures faster. It is a concrete, production-scale instance of service-topology as an observability capability.

The problem

Engineers fixing distributed systems need unified visibility into service dependencies, a failure’s blast radius, and whether an issue is local or upstream. Existing observability tools gave fragmented views.

The three-source approach

Service Topology fuses three telemetry sources, each compensating for the others’ blind spots — the core design idea:

No single source is sufficient; merging the three yields a fuller graph than any one.

Architecture

Stack

Apache Pekko Streams (processing pipeline), Apache Kafka (multi-region consumption), gRPC (topology query API), and Netflix’s internal distributed key-value store (graph storage).

Why it routed here

Production observability / SRE tooling — the third tight ops piece that triggered the spin-out of this spoke. Pairs with google-sre-agentic-ai (which uses observability + topology for incident investigation) and kubernetes-integration-tax (observability tooling integration in production). See synthesis.