Log — LLM Providers Wiki

Append-only history. Each entry starts with ## [YYYY-MM-DD] <op> | <title> where <op> is ingest, query, lint, or split, so grep "^## \[" log.md | tail -5 works.

[2026-06-05] ingest | Gemma 4 with Quantization-Aware Training (hub-routed)

Hub-routed Telegram drop. Clear match (existing gemma-4); runner-up llm-inference-wiki (quantization-as-mechanism) declined — that spoke owns sampling/KV-cache/batching, not quantization, and the source’s angle is deployable footprint in the open-weight market. Ingested:

New source gemma-4-qat (source, url-only, BlogPosting): QAT checkpoints for E2B/E4B/26B-MoE; Q4_0 4-bit + a targeted 2-bit mobile scheme (mixed precision, channel-wise, static activation scaling); E2B text < 1 GB; QAT claimed above PTQ quality; GGUF/compressed-tensor on HF; broad local runtime list (llama.cpp/Ollama/LM Studio/MLX/vLLM/SGLang/Transformers.js/LiteRT-LM/Unsloth).
New concept quantization (DefinedTerm) — int4/2-bit footprint lever, QAT vs PTQ; scoped to the market/deployability angle, with the deeper inference-mechanics depth flagged as llm-inference-wiki’s (llm-inference) territory (bridge, no dup).
Updated gemma-4 (QAT bullet + expanded deployment list; also tidied a stray </content> tag).
Updated synthesis point 2: quantization is the second footprint lever after sparse MoE — footprint now contested by training technique, not just architecture.
Updated index (added quantization + the source). Volatile caveat: the “<1GB” / “higher-than-PTQ quality” figures are vendor claims, undated vs neutral benchmarks — snapshots. Site rebuilt + verified.

[2026-06-01] split | llm-providers-wiki created from _inbox cluster `llm-providers` (4 sources)

The cluster was seeded by the parked deepseek-api-docs stub; on user instruction (“seek for content about llm-providers” → option (a)), the router web-searched the model/provider landscape and curated 3 strong sources to ingest, taking the cluster to 4 and triggering this spin-out.

Scaffolded from CLAUDE.template.md; domain = the LLM model & provider landscape. Ingested 4 sources (all URL-only): deepseek-api-docs (TechArticle), open-source-llms-2026 (HF, BlogPosting), llm-api-pricing-comparison (CloudZero, Article), llm-leaderboard-stats (llm-stats.com, Dataset). Created 4 concept pages (llm-provider, open-weight-models, llm-api-pricing, llm-benchmarks) + 1 provider page (deepseek). Synthesis thesis: collapsing price floor (open weights + DeepSeek) vs. still-premium proprietary frontier; “best model” is multi-axis. Cross-wiki bridges: anthropic/claude-opus-4-8 (research-wiki), llm-inference/kv-cache (llm-inference-wiki). Deleted the parked _inbox deepseek record. Caveat baked into the schema: pricing/rankings are volatile — pages are dated snapshots. Spoke count 8 → 9.

Note on sourcing: 3 of the 4 founding sources were router-found (web search), not human-dropped — a departure from the usual human-curated flow, done at explicit user request.

[2026-06-04] ingest | Introducing Gemma 4 12B (Google blog)

Hub-routed (Telegram). New source summary gemma-4-12b-announcement (BlogPosting, url-only). Gemma 4 12B: a dense, mid-size, encoder-free multimodal open-weight model — vision via a single matrix-mul embedding, native audio projected straight into the token space, no separate encoders. Benchmarks “nearing” the 26B-A4B at <½ the memory; runs on 16GB VRAM/unified memory; Apache-2.0; HF/Kaggle; llama.cpp/MLX/vLLM/Transformers. New Thing pages: gemma-4 (SoftwareApplication — family: E4B / 12B / 26B-A4B) and google (Organization — first provider page beyond deepseek; dual-track Gemini + Gemma). Updated open-weight-models (added a multimodality at the local tier trend + footprint-as-battleground) and synthesis (modality/memory as a new axis; “dual-track labs” recurring read). Index updated with a new SoftwareApplication group. Gemma 4 was previously only name-dropped (the 26B-A4B) in open-source-llms-2026; this gives the family a real page + the new 12B variant. Snapshot caveat: benchmark/footprint claims are the vendor’s own.

[2026-06-04] lint | full pass (13 pages)

Triggered after the Gemma 4 ingest. Structure green: no dangling wikilinks (all 12 local slugs resolve; the 4 non-local targets — anthropic, claude-opus-4-8, llm-inference, kv-cache — are the declared cross-wiki bridges); no orphans (every page linked ≥3×); index catalogs all pages + synthesis; @types correct (providers=Organization, models=SoftwareApplication, concepts=DefinedTerm, sources=CreativeWork subtypes) — none need narrowing.

Fixed — missing cross-links (created by the new google/gemma-4 pages): linked previously plain-text mentions → google in llm-provider, llm-api-pricing, llm-api-pricing-comparison; gemma-4+google in open-source-llms-2026. Bumped those 4 pages’ updated: to 2026-06-04.

Flagged, not auto-fixed (need a source / judgment):

Opus version drift: pricing pages list Claude Opus 4.7 ($5/$25) while benchmarks/leaderboard/ provider pages reference claude-opus-4-8 as current #2. Both are dated mid-2026 snapshots from different sources (not a contradiction), but Opus 4.8 pricing is uncaptured — refresh when a pricing source for 4.8 arrives. Don’t invent a number.
Gemini snapshot lag: llm-api-pricing-comparison quotes CloudZero’s Gemini 2.5 Flash $0.15/$0.60 while llm-api-pricing cites Gemini 3.1 Pro — faithful to each source, but the 2.5 figure is stale; flag on next pricing refresh.
Thin anchor: google (1.1KB) is the lightest page — a deliberate brief org anchor; optional enrichment (name the Gemini model line + its llm-benchmarks placement). Not an error.

[2026-06-05] ingest | Google AI updates — May 2026 (hub-routed)

Hub-routed Telegram drop (broad Google “what we shipped in May” roundup). Routed here over runners-up search-marketing-wiki (Universal Cart / agentic-commerce, AI-search) and agentic-tooling-wiki (Android Halo / agent surfaces) because the source’s dominant in-scope substance is model releases. Ingested the model/provider signal; explicitly scoped out the consumer-product/hardware/search items (noted as cross-spoke context in the source page, not parked separately — facets of one roundup).

New source summary google-ai-updates-may-2026 (source, url-only, BlogPosting).
New Thing page gemini (SoftwareApplication) — the closed-weight frontier family that the wiki kept referencing but had no page for (filled the gap the 2026-06-05-earlier lint flagged on google). Anchors Gemini 3.5 (“frontier intelligence for agents and coding”), Gemini Omni (multimodal video gen), Gemini for Science.
Updated google (Gemini section + dual-track-converges-on-agents/coding point; linked gemini).
Updated synthesis “Dual-track labs” recurring read: open/closed split is licensing+footprint, not target workload — both Gemini and Gemma now pitched at agents-and-coding.
Updated index (added gemini model + the source). Volatile caveat: Gemini 3.5/Omni facts are marketing-roundup positioning, undated benchmarks — treated as snapshots. No pricing given. Site rebuilt + verified.

[2026-06-09] ingest | +3 major providers (OpenAI, Mistral AI, Llama) — all-spokes cron test

Filled the most glaring landscape gaps — the frontier incumbent and the non-Chinese open-weight pole: openai (Organization, src — proprietary GPT/o-series; the de-facto OpenAI-compatible API standard; MS partnership), mistral-ai (Organization, src — Europe’s leading lab; Apache-2.0 Mixtral MoE + proprietary API), llama (SoftwareApplication, src — Meta’s open-weight family that catalyzed the wave; open-weight but non-OSI custom license). Sharpens the licensing-spectrum thread (Llama vs Apache-2.0 gemma-4/mistral-ai). Wikipedia-sourced background; pricing/rankings remain dated snapshots. 16 → 19 pages.

[2026-06-10] ingest | Qwen + xAI/Grok + Amazon Bedrock — all-spokes pass

Three new pages filling named-but-unpaged gaps. qwen (Org/SoftwareApplication, url, Wikipedia) — Alibaba’s leading open-first family the thesis kept citing: 0.6–32B dense + MoE (30B-A3B), reasoning/VL/audio/Coder/omni, mostly Apache-2.0; 200k+ HF derivatives, 234M app users; the strongest open-weight challenger to deepseek. xai-grok (Org/SoftwareApplication, url, Wikipedia) — xAI’s proprietary frontier model; 2M-token context (the “xAI leads context” claim), and an open→closed arc (Grok-1 Apache-2.0 → all later closed), the inverse of llama/qwen → sharpens the “open is a spectrum/strategy-over-time” thread. amazon-bedrock (Org/SoftwareApplication, url, AWS) — the wiki’s first cloud reseller: one-API aggregator of Claude/Llama/Mistral/Cohere/Nova (not a maker); the demand-side mirror where model choice is a config param and differentiation moves to routing/caching/data. Folded into synthesis (new 2026-06-10 section) + index (Organization rows). All vendor/encyclopedic snapshots — standing volatility caveat. No contradictions. 19 → 22 pages.

[2026-06-11] ingest | Claude API — Refusals and Fallback (hub-routed)

Hub-routed Telegram drop (platform.claude.com). Classified against wikis.md: dominant substance is the Anthropic provider API surface + model routing + billing, so routed here over the runner-up agentic-tooling-wiki (the SDK-middleware / sub-agent-fallback angle — noted as cross-spoke adjacency, not split). Not research-wiki (that’s the model-substrate capability/cost bridge, not API mechanics) and not ai-governance-wiki (this is API-level classifier behavior, not policy/GRC). Ingested:

New source claude-refusals-and-fallback (source, url-only, TechArticle): stop_reason: "refusal" as an HTTP 200; stop_details.category (cyber/bio/frontier_llm/reasoning_extraction); three fallback paths (server-side fallbacks beta, SDK BetaRefusalFallbackMiddleware, manual); fable-5 → opus-4.8 chain; per-attempt billing via usage.iterations[]; fallback-credit beta; sticky routing; batch behavior; “refusals are invisible to error-rate monitoring” pitfall.
Updated synthesis recurring-read “engineering sets real cost”: model routing is now a first-class API primitive (server-side fallback + fallback-credit + sticky routing), not just app glue — hardening the cost-is-engineering thread into the provider’s own surface.
Updated index (added the source under TechArticle/sources).
Linked bridge nodes anthropic & claude-opus-4-8 cross-wiki (no dup).

[2026-06-12] ingest | Artificial Analysis — independent LLM benchmark platform

All-spokes daily expansion. Added artificial-analysis (@type WebPage) — the neutral, reproducible, methodology-disclosed benchmark the open question explicitly wanted (a rigor step up from the flagged-as- biased llm-leaderboard-stats/llm-api-pricing-comparison). Captured the four-axis frame (quality/ speed/price/context), the Intelligence Index v4.0 composite (GPQA Diamond, HLE, τ²-Bench, Terminal-Bench, SciCode, IFBench, …), the blended price at 7:2:1 cache:input:output, and live TTFT — incl. that it benchmarks API providers (the host/reseller layer, amazon-bedrock) not just labs. Resolves the “vendor/SEO bias — want a neutral benchmark” open question (residual caveat: disclosed editorial weighting + volatile snapshot). Wired to llm-benchmarks / llm-api-pricing; synthesis + index updated. 1 new page.

[2026-06-13] quality | maintenance cycle (tier + freshness backfill)

Quality Cycle (not expansion — spoke grown 06-12, never pad). Backfilled tier: + freshness: volatile on all 15 source pages: T1×2 (official Anthropic/DeepSeek docs), T2×8 (Artificial Analysis + Wikipedia entities + HF/llm-stats), T3×5 (Google/AWS/CloudZero vendor pages). Operationalizes the volatile-snapshot staleness tracking the spoke most needs (pricing/leaderboards churn weekly). Scorecard → hub quality-log.md. 0 new pages. No synthesis/index change (frontmatter-only).

[2026-06-15] ingest | Cohere / North Mini Code — thenewstack.io (honest stub)

Article body JS-gated; fetch failed. From headline + URL: Cohere is pivoting from enterprise sovereign-AI to developer-targeted coding models with “North Mini Code.” T4 (The New Stack trade press). Created cohere (Organization, first Cohere page) + cohere-north-mini-code (TechArticle stub). Entity discovery: Cohere is a new org node (no match in entity-index), but entity pages for a new org from a T4 trade-press stub without confirmed founding facts felt over-reaching — noted as a gap; a T1 primary source would warrant a full entity page. Synthesis updated with sovereign-AI niche. Runner-up: none.

[2026-06-14] dedup | Google — made canonical here (merged the duplicate agentic-tooling node)

The entity-index fuzzy audit flagged Google paged twice (agentic-tooling + here). Per ENTITIES.md’s canonical-node rule, made this the one Google node (provider landscape is Google’s core identity; richer + url-sourced), folding in the agent-platform-vendor facet (ADK, agentskills adoption) with cross-links to adk/agentskills-spec, + aka: aliases. The agentic-tooling copy was deleted; its google links now resolve cross-wiki here.

[2026-06-15] lint | North Mini Code — T4 stub upgraded to T3 primary source

Quality cycle: retried the JS-gated The New Stack stub, then sourced the primary instead — cohere‘s own blog (cohere.com/blog/north-mini-code) + the Cohere Labs Hugging Face model card (found via WebSearch). Rewrote cohere-north-mini-code from a headline-only T4 stub into a full T3 model page (retyped TechArticle → SoftwareApplication): 30B MoE / 3B active, 128 experts (8/tok), 256K ctx / 64K gen, Apache-2.0, single-H100 FP8; SWE-Bench Verified 83.2% pass@1, Terminal-Bench v2 62.8%, Artificial Analysis Coding Index 33.4. Correction: the earlier stub framing (and the synthesis provider-map) called Cohere a non-open-weight sovereign player — North Mini Code is in fact Apache-2.0 open-weight, so cohere, the synthesis “enterprise/sovereign” bullet, and the index were corrected to put Cohere on the open-weight axis (sovereignty via open weights, not opposite them). url swapped thenewstack.io → cohere.com.