AIOps (AI / agents applied to operations)
The application of machine learning and, increasingly, LLM agents to operations work — anomaly detection, incident triage, runbook generation, postmortem drafting, and incident investigation. Cuts across all three pillars of platform-ops (site-reliability-engineering, platform-engineering, observability).
In the sources
google-sre-agentic-ai is the worked example: agentic AI across the SRE workflow, augmenting (not replacing) humans, on a Gemini + ADK + MCP stack. Its design mandates are the substance — SLOs + fallbacks, identity/permissions, explainability over black-box, auditability — i.e. agents in ops must be constrained and accountable.
Boundary note (cross-spoke)
AIOps is agents applied to ops, not agent-building tooling. The builder stack
(ADK, MCP, Gemini Enterprise Agent Platform) is documented in agentic-tooling-wiki;
this page owns the operational application of it. Watch for sources that bridge the
two (e.g. agents that act over a service-topology graph).