What AI Agents Should Never Do on Their Own (Towards Data Science)
A practitioner argument that “more freedom → better output” is incomplete: agent autonomy is only useful when bounded. The piece gives the wiki its first explicit framework for where to put the human checkpoints — the agent-guardrails concept.
Six things agents shouldn’t do autonomously
- Destructive file ops —
rm -rf,git reset --hard(discard unrecoverable work). - DB writes/migrations —
DELETEwithoutWHERE,DROP,TRUNCATE, prod schema changes. - Cloud infra —
terraform apply,kubectl delete, IAM changes. - Production deployments — any release to live, “regardless of code quality.”
- Auth/security logic — code that “looks correct” but fails on edge cases (“bugs here… show up in incident reports, sometimes months later”).
- Secrets/credentials — reading or writing
.envfiles and API keys.
The organizing lens — reversibility / recovery cost
Assess every task by “can this be undone?” Low recovery cost (refactors, unit tests) → high autonomy; steep penalty (dropped prod table) → mandatory pre-execution checkpoint. Autonomy is graduated by blast radius, not all-or-nothing.
Concrete guardrail mechanisms
AGENTS.mdcontract — repo file with setup, coding rules, safety rules, and a “Definition of Done”; the agreed contract before work begins (the cross-vendor sibling of the CLAUDE.md / spec-driven-development “constitution”).blocked_commands.md— an explicit block list of destructive ops needing approval; “use explicit block lists rather than trusting agents to infer boundaries.”- Two-agent loop — one implements, a second reviews the diff “without ego investment” — the agent-orchestration adversarial-verification pattern applied as a safety gate.
- Final reports — summarize changes, files, test results, risks, and incomplete work.
- Never let the agent pick deploy timing — “you know what’s in flight, what incidents are open… the agent doesn’t have any of that context.”
Why it matters here
It turns the wiki’s scattered reliability/governance mentions (Scout’s policy-conformance audit, Hermes’ command-approval, the harness review/eval discipline) into a named, usable discipline — a permission matrix keyed on reversibility, not vibes. It’s the human-in-the-loop counterpart to autonomy: where durable-agents and self-improving-agents push agents to do more alone, this maps the bright lines they shouldn’t cross. (Caveat: single practitioner op-ed, no benchmarks.)
Related
agent-guardrails · agentic-coding-harness · agent-orchestration · spec-driven-development · agent-middleware · durable-agents