Software Source Code source ↗ source url updated Tue Jun 16 2026 00:00:00 GMT+0000 (Coordinated Universal Time)

autoresearch

Andrej Karpathy‘s proof-of-concept for the “agentic scientist”: AI agents that autonomously run ML research overnight on a single GPU, iterating on nanochat (Karpathy’s minimal single-GPU LLM trainer). It is the origin of the autoresearch/iteration paradigm that autonovel credits — so the two now sit in the wiki as sibling instances of the same idea, one doing science, one doing fiction. MIT-licensed; ~87k GitHub stars.

The loop

The agent reads program.md (human-written guidance) and proposes edits to train.py only — architecture, hyperparameters, optimizer, batch size.
Runs training for a fixed 5-minute wall-clock budget.
Measures validation bits-per-byte (val_bpb) — lower is better.
Keeps improvements, discards failures, repeats — ~12 experiments/hour, ~100 per overnight run.

Humans write program.md and review the logs on waking; the experimental loop itself runs with no human in it.

Two design choices that make it work

Constrained action surface — agents edit only train.py; prepare.py (data/tokenization) and core infra are off-limits. Single-file diffs stay reviewable — the containment posture as an enabler of autonomy, not just a safety tax.
Fixed time budget — every experiment gets the same 5 minutes, so architecturally different runs are directly comparable. The budget is the stop condition.

Why it matters here — the feedback-signal contrast

autoresearch is the cleanest instance yet of loop-engineering / agent-loops-verification (“the bottleneck is the feedback signal, not generation”), and it sharpens that thesis by contrast with autonovel:

autoresearch has an objective, ground-truth metric (val_bpb) — the easy case; the loop can trust its own signal for free.
autonovel has no ground-truth metric for prose quality, so it had to engineer a feedback signal (a mechanical scan + an LLM-judge). The harder case.

So the same paradigm spans a spectrum: where an objective metric exists, autonomy is nearly free; where it doesn’t, the research problem becomes building a trustworthy evaluator. That is the loop thesis’s load-bearing claim made concrete across two repos.

Tier

T1 — first-party project repo (Karpathy’s own code), the spoke’s convention for project source. Self-described proof-of-concept, not a benchmarked system; recorded. freshness: volatile (active repo).

Cross-spoke

Karpathy is a research-wiki node (andrej-karpathy, the mechanizing-reasoning lineage); this is agent tooling (an autonomous-agent loop system), so it ingests here per the split, linking the bridge node rather than duplicating it.

autonovel · andrej-karpathy · loop-engineering · agent-loops-verification · self-improving-agents · agent-guardrails · hermes-agent