Defined Term practice updated Thu Jun 04 2026 00:00:00 GMT+0000 (Coordinated Universal Time)

Agent guardrails (autonomy boundaries)

The discipline of bounding what an agent may do without a human checkpoint — the safety counterpart to autonomy. Its organizing principle is reversibility / recovery cost: grant high autonomy where a mistake is cheap to undo (refactors, unit tests), require pre-execution approval where the penalty is steep or irreversible (dropped production tables, prod deploys, IAM changes) agents-never-do-alone. Autonomy is graduated by blast radius, not all-or-nothing.

The bright lines

Categories that should require human approval before execution agents-never-do-alone: destructive file ops (rm -rf, git reset --hard), DB writes/migrations (DROP, TRUNCATE, DELETE sans WHERE), cloud infra (terraform apply, kubectl delete, IAM), production deployments (regardless of code quality), auth/security logic (failures surface in incident reports, not unit tests), and secrets/credentials.

Mechanisms

AGENTS.md contract — setup + coding + safety rules + “Definition of Done”; the cross-vendor sibling of the CLAUDE.md / spec-driven-development “constitution”.
blocked_commands.md — an explicit block list; don’t trust the agent to infer limits.
Two-agent review loop — an implementer + an egoless diff-reviewer: the agent-orchestration adversarial-verification pattern repurposed as a safety gate.
Human-in-the-loop checkpoints — the gating itself; in framework terms, LangChain’s HumanInTheLoopMiddleware (agent-middleware) and the supervised-board model of agent-kanban.
Final reports + deploy-timing reserved for humans (the agent lacks live-incident context).

Where it sits

This names and operationalizes a reliability/governance thread recurring across the wiki: microsoft-scout‘s continuous policy-conformance audit, hermes-agent‘s command-approval / container isolation, and the review/eval discipline behind the agentic-coding-harness (“structure substitutes for capability” includes structure that says no). It’s the explicit counterweight to the autonomy push of durable-agents and self-improving-agents — and a strong candidate to be itself a stack of agent-middleware.

agents-never-do-alone · agentic-coding-harness · agent-orchestration · agent-middleware · spec-driven-development · microsoft-scout · hermes-agent

Agent guardrails (autonomy boundaries)

The bright lines

Mechanisms

Where it sits

Related

Linked from