Defined Term domain updated Fri Jun 05 2026 00:00:00 GMT+0000 (Coordinated Universal Time)

Platform Ops (umbrella)

The practice of running and operating cloud-native distributed systems in production — the work that begins after software is written and deployed. This wiki’s umbrella concept, spanning three intertwined sub-disciplines that the founding sources each illuminate:

site-reliability-engineering — the reliability/operations discipline and its incident-response practice (google-sre-agentic-ai).
platform-engineering — building and integrating the internal platform that application teams run on (kubernetes-integration-tax).
observability — knowing what the system is actually doing, including its service-topology (netflix-service-topology).

Cutting across all three: aiops — AI/agents applied to operations.

The through-line

The founding sources converge on one theme: in production, the hard problem is the seams, not the components. Individual tools, services, and telemetry sources each work in isolation; the operational cost — and the value of platform-ops work — lies in making them cohere into a system you can understand and trust under failure. See synthesis.

Platform Ops (umbrella)

The through-line

Linked from