DORA metrics (the “Four Keys”)
The delivery-performance quantification framework — the other half of the “Quantification” open question that service-level-objectives only half-answered. Where SLOs measure the reliability of the running service (is the user’s experience good enough?), DORA measures the delivery pipeline that ships changes to it (how fast and how safely do changes reach production?). From Google’s DORA research program (the State of DevOps reports / Accelerate). URL-only ingest; source = dora.dev.
The metrics — throughput vs stability
Originally four keys, now framed as five, split into two axes:
Throughput (speed):
- Deployment Frequency — how often you deploy to production.
- Change Lead Time — time from commit to running in production.
Stability (safety): 3. Change Fail Rate — share of deployments needing immediate remediation (rollback/hotfix). 4. Failed Deployment Recovery Time — time to recover from a failed deployment (replaced “MTTR / time to restore” — note the term shift; it’s specifically deployment recovery now). 5. Deployment Rework Rate (newer) — share of unplanned deployments following production incidents.
Performance is bucketed into tiers (Elite → Low) against evolving annual benchmarks — exact thresholds move year to year (not pinned here; they’re re-baselined in each State of DevOps report).
The headline finding
“Speed and stability are not tradeoffs.” Top performers score well on all axes at once — high throughput and high stability — overturning the intuition that shipping faster means breaking more. This is the empirical backbone under the wiki’s gitops / aiops control-loop bets: better integration of the delivery system buys both speed and safety, not one at the other’s expense.
Where it sits — complements, doesn’t duplicate, SLOs
- service-level-objectives = is the service reliable enough for users right now? (SLI/SLO/ error-budget, user-facing).
- DORA = is the org’s change-delivery process fast and safe? (pipeline-facing).
- They interlock: an error budget gates whether to ship; DORA measures how well the shipping itself performs. Change Fail Rate + Recovery Time are the delivery-side echo of the reliability the SLO protects, and a blown error budget should show up as DORA stability pressure.
- Deployment metrics make the gitops loop measurable (reconcile-from-Git → deploy frequency / lead time); the aiops reliability paradox gains a yardstick too — an ops agent that drafts and applies fixes is itself a “deployer” whose Change Fail Rate / Recovery Time can be tracked.
Caveat
Delivery-process metrics, not a reliability guarantee: gameable (deploy tiny no-op changes to pad frequency) and only meaningful as a set — the four/five move together or the picture is incomplete. Benchmark tiers are vendor-defined (Google/DORA) and re-baselined yearly.
Related
service-level-objectives · gitops · aiops · site-reliability-engineering · platform-ops · synthesis