Agent guardrails (autonomy boundaries)
The discipline of bounding what an agent may do without a human checkpoint — the safety counterpart to autonomy. Its organizing principle is reversibility / recovery cost: grant high autonomy where a mistake is cheap to undo (refactors, unit tests), require pre-execution approval where the penalty is steep or irreversible (dropped production tables, prod deploys, IAM changes) agents-never-do-alone. Autonomy is graduated by blast radius, not all-or-nothing.
The bright lines
Categories that should require human approval before execution agents-never-do-alone:
destructive file ops (rm -rf, git reset --hard), DB writes/migrations (DROP, TRUNCATE,
DELETE sans WHERE), cloud infra (terraform apply, kubectl delete, IAM), production
deployments (regardless of code quality), auth/security logic (failures surface in incident
reports, not unit tests), and secrets/credentials.
Mechanisms
AGENTS.mdcontract — setup + coding + safety rules + “Definition of Done”; the cross-vendor sibling of the CLAUDE.md / spec-driven-development “constitution”.blocked_commands.md— an explicit block list; don’t trust the agent to infer limits.- Two-agent review loop — an implementer + an egoless diff-reviewer: the agent-orchestration adversarial-verification pattern repurposed as a safety gate.
- Human-in-the-loop checkpoints — the gating itself; in framework terms, LangChain’s
HumanInTheLoopMiddleware(agent-middleware) and the supervised-board model of agent-kanban. - Final reports + deploy-timing reserved for humans (the agent lacks live-incident context).
Where it sits
This names and operationalizes a reliability/governance thread recurring across the wiki: microsoft-scout‘s continuous policy-conformance audit, hermes-agent‘s command-approval / container isolation, and the review/eval discipline behind the agentic-coding-harness (“structure substitutes for capability” includes structure that says no). It’s the explicit counterweight to the autonomy push of durable-agents and self-improving-agents — and a strong candidate to be itself a stack of agent-middleware.
Related
agents-never-do-alone · agentic-coding-harness · agent-orchestration · agent-middleware · spec-driven-development · microsoft-scout · hermes-agent