llm-providers-wiki

Synthesis — LLM Providers

The evolving thesis. Spun out of the hub _inbox llm-providers cluster on 2026-06-01, seeded by deepseek-api-docs and grown with three landscape sources the router went and found (open-weight roundup, API-pricing comparison, the llm-stats leaderboard).

Current thesis

The 2026 model market is defined by a collapsing price floor meeting a still-premium frontier, with open weights as the force pushing the two apart.

A ~600× cost spread (llm-api-pricing): budget APIs at ~$0.10/1M input vs frontier reasoning at $30/$180. deepseek disrupted the bottom — capable open weights plus an OpenAI-compatible ultra-cheap API — dragging the floor down and forcing proprietary labs to justify their premium.
Open-weight has caught up enough for production (open-weight-models): Llama 4, Qwen3, gemma-4, DeepSeek, Kimi, GLM. Two structural enablers — sparse MoE (frontier capability at practical inference cost) and Apache-2.0/MIT licensing clarity — plus a context explosion (256K → 1M → 10M tokens), and now multimodality at the local tier: gemma-4 12B runs encoder-free vision + native audio on 16GB (gemma-4-12b-announcement), making memory footprint a competitive axis alongside capability/cost/context. quantization is the second footprint lever (after sparse MoE): Gemma 4’s QAT checkpoints push the floor to a sub-1GB E2B and a 2-bit mobile scheme, claimed above PTQ quality gemma-4-qat — footprint is now contested by training technique, not just architecture.
“Best model” is multi-axis (llm-benchmarks): composite scores over capability × speed × price × context. Proprietary labs (anthropic, OpenAI) still lead reasoning; open/Chinese labs (Qwen, DeepSeek) lead cost-per-quality; xAI leads context.

The unifying tension: capability still concentrates at a few proprietary frontier labs, while cost and access are being democratized from below by open weights. Whether the frontier premium holds depends on whether open-weight reasoning closes the gap — the live question of the wiki.

The provider map now sorts into five positions plus a reseller layer. The market is no longer a simple open-vs-closed binary; the players cluster by strategy:

Consumer frontier (closed): openai (GPT/o-series; the API shape everyone else clones) and anthropic hold the reasoning lead and the $30/$180 premium. xai-grok sits here too — proprietary, reasoning-focused — but is distinctive for a 2M-token context (the “xAI leads context” claim) and an open→closed arc (Grok-1 was Apache-2.0; everything since closed), the inverse of the open-first labs.
Open-weight challengers: qwen (Alibaba) is the strongest evidence open weights are closing from below — the deepest open model ladder (0.6–32B dense + MoE; reasoning/VL/audio/Coder/omni), mostly Apache-2.0, 200k+ Hugging Face derivatives, and open-first (unlike dual-track google) — alongside deepseek and Meta’s llama, the family that catalyzed the wave.
European open-weight: mistral-ai (Apache-2.0 Mixtral MoE) — the open axis is no longer China-only.
Enterprise / sovereign: cohere competes not on consumer reach but on data residency and compliance (private cloud, on-premise, regional EU/APAC/UK) for regulated industries — the founders’ Transformer-paper roots (Aidan Gomez) behind an early “same architecture, different deployment model” bet. Crucially its North Mini Code (9 Jun 2026, the first North-family model) is Apache-2.0 open-weight (30B MoE / 3B active, runs on a single H100): Cohere pursues sovereignty through open weights and local deployment, so it sits on the open-weight axis too, not opposite it — the sovereign angle is go-to-market, not licensing. It scores 33.4 on the independent artificial-analysis Coding Index, opening a developer-coding play on top of the enterprise base.
Dual-track: google ships both a closed frontier (gemini) and permissive open weights (gemma-4) — see Recurring reads.

Llama also sharpens the licensing thread: open-weight but not OSI-open (custom community license, OSI-disputed), against the Apache-2.0 cleanliness of gemma-4/mistral-ai/qwen — “open” is a spectrum, not a binary. Across the top sits a cloud-reseller layer: amazon-bedrock is not a model maker but a one-API aggregator of Claude/Llama/Mistral/Cohere/Nova — the demand-side mirror of the map, where model choice becomes a config parameter and differentiation moves up to routing/caching/data. Through a reseller, open and closed models are the same API call; the buyer meets the whole price/openness spectrum through one pane.

Recurring reads

Compatibility as strategy — challengers (deepseek) conform to OpenAI-compatible (and now Anthropic-compatible) API shapes to erase switching costs; the incumbent API is becoming a de-facto standard.
Cost is dominated by output tokens (2–6× input) and slashed by caching/batch/routing — so engineering, not just model choice, sets real cost. Model routing is now a first-class API primitive, not just app glue (claude-refusals-and-fallback): Anthropic ships server-side fallbacks that retry a safety-declined request down a model chain (fable-5 → opus-4.8) inside one call, with per-attempt billing (usage.iterations[]), a fallback-credit beta to avoid double-paying prompt cache, and sticky routing that pins a conversation to whoever accepted. The “engineering sets real cost” thread hardens into the provider’s own surface — and note refusals are an HTTP 200, invisible to error-rate monitoring.
Dual-track labs — google ships both a closed frontier (gemini) and a permissive open-weight family (gemma-4), unlike the open-only Chinese labs or closed-only frontier labs. Open weights double as developer-mindshare seeding, not just a product. The May 2026 roundup google-ai-updates-may-2026 adds a wrinkle: both tracks are now pitched at the same use-case — agents and coding (Gemini 3.5 “frontier intelligence for agents and coding”; Gemma 4 12B “agentic workflows”). The open/closed split is about licensing and footprint, not target workload — a lab can run one agent-and-coding pitch across the whole price/openness spectrum.

Open questions

Does the frontier premium survive? If open-weight reasoning (DeepSeek/Qwen) closes on Anthropic/OpenAI, the $30/$180 tier loses its moat. Watch llm-benchmarks over time.
How stale is this? Pricing and rankings churn weekly; every page here is a dated snapshot.
Vendor/SEO bias — pricing-comparison and leaderboard sources have incentives; numbers are indicative. A neutral, reproducible benchmark would be the highest-value next source. Addressed (2026-06-12): artificial-analysis is the methodology-disclosed, continuously-re-run independent platform (Intelligence Index v4.0 = composite of GPQA Diamond / HLE / τ²-Bench / Terminal-Bench / SciCode; blended price at a 7:2:1 cache:input:output ratio; live TTFT) — the reproducible yardstick to track the “does the frontier premium survive?” question against. It also makes the host/reseller layer measurable (same weights across providers). Residual caveat: composite weighting + the 7:2:1 blend are disclosed editorial choices, and it’s still a churning snapshot.
Geopolitics/licensing — Chinese open-weight leaders (DeepSeek, Qwen, Kimi) dominate the open field; enterprise/regulatory acceptance is unmodeled here.

Contradictions / tensions

None internal yet. Cross-source tension to watch: leaderboards crown proprietary reasoning while the open-weight roundup argues practical fit beats rank — a framing disagreement, not a fact conflict.

Cross-spoke adjacency

research-wiki — owns anthropic & claude-opus-4-8 as the model substrate under the tools-for-thought ecosystem (capability+cost lens). Here they’re entries in the broader market.
llm-inference-wiki — how these models run (MoE efficiency, kv-cache, serving). This spoke is who makes them and what they cost.
agentic-tooling-wiki — consumes providers (model selection, “Sonnet+harness > raw Opus”); the cost/capability frontier here sets the harness economics there.

Index — LLM Providers Wiki

Catalog of every page, grouped by schema.org @type. Spine: synthesis (thesis), log.md (history), this file (catalog). Some wiki-links resolve cross-wiki (research-wiki, llm-inference-wiki) — intentional bridge links. Pricing/ranking facts are dated snapshots.

DefinedTerm (concepts)

llm-provider — umbrella: the kinds of providers (frontier labs, open-weight labs, cloud resellers) and their axes · domain
open-weight-models — the open-weight wave: MoE, Apache-2.0/MIT licensing, local deploy · concept
llm-api-pricing — per-token pricing tiers, the ~600× spread, caching/batch/routing levers · concept
llm-benchmarks — composite multi-axis leaderboards (capability × speed × price × context) · concept
quantization — int4/2-bit precision as a footprint lever; QAT vs PTQ; sub-1GB open models · mechanism

Organization (providers)

deepseek — Chinese lab; low-cost OpenAI-compatible API + strong open weights; the seed subject
google — dual-track: proprietary Gemini frontier + open-weight Gemma family
openai — frontier incumbent; proprietary GPT/o-series; its API is the de-facto compatibility standard · source
mistral-ai — Europe’s leading lab; Apache-2.0 open weights (Mixtral MoE) + proprietary API · source
qwen — Alibaba; the leading Chinese open-first family (Apache-2.0; deepest ladder; 200k+ HF derivatives) · source
xai-grok — Elon Musk’s xAI; proprietary frontier; 2M-token context; open→closed arc (Grok-1 was Apache-2.0) · source
amazon-bedrock — AWS reseller/aggregator: many providers’ models behind one API (the cloud-reseller axis) · source
cohere — Canadian enterprise/sovereign-AI lab; Command + North model families; no consumer product

SoftwareApplication (models)

gemma-4 — Google’s open-weight family (E4B / 12B / 26B-A4B); Apache-2.0; encoder-free multimodal 12B
gemini — Google’s closed-weight frontier family (Gemini 3.5 agents/coding, Omni video, for Science)
llama — Meta’s open-weight family (Llama 1→4); catalyzed the open wave; custom (non-OSI) license · source
cohere-north-mini-code — Cohere’s first agentic coding model; Apache-2.0 open-weight 30B MoE (3B active), single-H100, 256K ctx; SWE-Bench 83.2% · source · T3 · cohere.com

TechArticle / BlogPosting / Article / Dataset (sources)

deepseek-api-docs — DeepSeek’s API reference (V4 Flash/Pro, thinking mode, context caching) · source · api-docs.deepseek.com
open-source-llms-2026 — Hugging Face: the 2026 open-weight roundup (Llama 4, Qwen3, Gemma 4, Kimi, …) · source · huggingface.co
gemma-4-12b-announcement — Google: Gemma 4 12B, encoder-free multimodal (vision+audio), 16GB local · source · blog.google
google-ai-updates-may-2026 — Google: May 2026 AI roundup (Gemini 3.5, Omni, for Science; + out-of-scope products) · source · blog.google
gemma-4-qat — Google: Gemma 4 quantization-aware-training checkpoints (Q4_0 + 2-bit mobile; sub-1GB E2B) · source · blog.google
llm-api-pricing-comparison — CloudZero: every major model ranked by cost; the ~600× spread · source · cloudzero.com
llm-leaderboard-stats — llm-stats.com: 300+ models by composite intelligence/speed/price · source · llm-stats.com
artificial-analysis — independent, methodology-disclosed benchmark (Intelligence Index v4.0, blended price 7:2:1, TTFT); the neutral reproducible yardstick · source · artificialanalysis.ai
claude-refusals-and-fallback — Anthropic API: the refusal stop_reason + model-fallback (server-side/SDK/manual), fallback-credit billing, sticky routing · source · platform.claude.com

Synthesis

synthesis — the thesis: collapsing price floor vs. premium frontier, open weights as the wedge

Bridge nodes (live in sibling wikis, linked cross-wiki)

anthropic · claude-opus-4-8 (research-wiki) · llm-inference · kv-cache (llm-inference-wiki)