Blog Posting source ↗ source url updated Thu Jun 04 2026 00:00:00 GMT+0000 (Coordinated Universal Time)

Introducing Gemma 4 12B (Google)

Google’s announcement of Gemma 4 12B, a new open-weight variant in the gemma-4 family from google. Positioned between the small E4B and the larger 26B-A4B MoE (open-source-llms-2026) — the dense, mid-size member aimed at local deployment on consumer hardware.

What’s announced

A unified, encoder-free multimodal model — vision and audio inputs flow straight through the language-model backbone, no separate modality encoders.
- Vision: a lightweight embedding module — “a single matrix multiplication, positional embedding and normalizations.”
- Audio: native input, projecting “raw audio signal into the same dimensional space as text tokens” — no audio encoder at all.
“Advanced reasoning” — pitched for “powerful multi-step reasoning and agentic workflows.”

Numbers

Performance “nearing our 26B model” on standard benchmarks (cf. llm-benchmarks) at < half the total memory footprint.
Runs locally on 16GB of VRAM or unified memory.

Licensing & availability

Apache-2.0 (open-weight-models) — weights open on Hugging Face and Kaggle.
Framework support: Hugging Face Transformers, llama.cpp, MLX, vLLM (llm-inference).
No API pricing in the announcement (open weights; self-host — cf. llm-api-pricing).

Why it matters

Two firsts for this wiki’s open-weight thread: (1) multimodality including native audio in a locally-runnable open model, and (2) an encoder-free architecture that folds audio into the token stream. The competitive lever is memory — frontier-adjacent quality at half the footprint, runnable on a 16GB laptop/GPU.

gemma-4 · google · open-weight-models · open-source-llms-2026 · llm-benchmarks · llm-inference

Introducing Gemma 4 12B (Google)

What’s announced

Numbers

Licensing & availability

Why it matters

Related

Linked from