Blog Posting source ↗ source url updated Fri Jun 05 2026 00:00:00 GMT+0000 (Coordinated Universal Time)

Gemma 4 with Quantization-Aware Training (Google)

google‘s announcement of QAT-optimized checkpoints for the gemma-4 family — the next move in the family’s footprint-first strategy, pushing open weights (open-weight-models) into ever-smaller memory envelopes via quantization.

What’s announced

QAT checkpoints released for E2B (edge 2B), E4B (edge 4B), and the 26B MoE variants of gemma-4.
Two quantization schemes:
- Q4_0 — standard 4-bit quantization across all models.
- Mobile-specialized format — targeted 2-bit quantization on token-generation layers while keeping reasoning components at higher precision, plus static activation scaling and channel-wise quantization.
QAT vs PTQ: because quantization is folded into training (not applied post-hoc), Google claims it “yield[s] even higher overall quality compared to standard PTQ baselines” — i.e. less quality loss per bit than post-training quantization.

Numbers

The E2B text-only model (without Per-Layer Embeddings) reportedly requires < 1 GB of memory — sub-gigabyte LLM deployment.
(A VRAM comparison chart is referenced for the other sizes; specific figures not in the text.)

Tooling & availability

On Hugging Face in GGUF and compressed-tensor formats.
Deploys via llama.cpp, Ollama, LM Studio, vLLM, SGLang, MLX, Transformers.js, LiteRT-LM, Unsloth — a notably broad runtime list skewed to local / edge / mobile inference (llm-inference).

Why it matters

Confirms memory footprint as a competitive axis for gemma-4: the 12B made the quality case at 16 GB gemma-4-12b-announcement; this makes the floor case — capable open models under 1 GB, runnable on phones. Quantization, not just architecture, is now an explicit lever in the open-weight market (synthesis). No API pricing — these are self-host weights (llm-api-pricing).

gemma-4 · quantization · google · open-weight-models · gemma-4-12b-announcement · llm-inference

Gemma 4 with Quantization-Aware Training (Google)

What’s announced

Numbers

Tooling & availability

Why it matters

Related

Linked from