Claude Opus 4.8 review (Simon Willison) — source summary
Simon Willison’s 28 May 2026 write-up of claude-opus-4-8. Delivered via Telegram,
ingested 2026-05-29. Text in raw/claude-opus-4-8-review.md.
What it says
- Anthropic frames the release, refreshingly, as “a modest but tangible improvement” over 4.7 — Willison’s favorite part is the honest, low-hype framing.
- Headline improvement is honesty: Opus 4.8 is “around four times less likely than its predecessor” to make unsupported claims, and more likely to flag uncertainties about its own work. (Anthropic notes models often “jump to conclusions, confidently claiming progress despite thin evidence.”)
- New capabilities he flags: mid-conversation system messages (
role: "system"after a user turn — steer instructions late in a long agentic loop without restating the prompt, preserving cache hits); lower prompt-cache minimum (1,024 tokens, down from 4,096 on 4.7). - Unchanged: $5/$25 per Mtok in/out (same as 4.5–4.7); Jan 2026 knowledge/training cutoff; 1M-token context; 128K max output. Fast mode now 2× base ($10/$50, down from $30/$150), research-preview only.
- Includes his signature “pelican riding a bicycle” SVG test across thinking levels.
Why it’s here
claude-opus-4-8 is the model substrate under the agent ecosystem this wiki tracks — claude-cowork, claude-managed-agents, and the Claude-driven implementations gbrain and llm-wiki-agent. Reflexively, it is also the model maintaining this very wiki, and its flagged improvement — flagging uncertainties rather than overclaiming — is precisely the discipline this wiki’s synthesis and lint passes try to enforce.