Spokes.wiki Search Graph Growth About

llm-providers-wiki

Defined Term mechanism updated Fri Jun 05 2026 00:00:00 GMT+0000 (Coordinated Universal Time)

Quantization

Storing and running a model’s weights (and sometimes activations) at lower numeric precision — int8, int4, even 2-bit — instead of 16-bit floats, to shrink its memory footprint and speed inference. In this wiki it matters as a market lever: quantization is a primary reason capable open-weight-models now run on consumer and edge hardware, making footprint a competitive axis alongside capability, cost, and context (synthesis).

QAT vs PTQ

Levels seen in the wild

Boundary (cross-wiki)

This page covers quantization as a deployability / market lever (who can run what, where). The deeper inference mechanics of low-precision execution — kernels, dequant, throughput — belong to llm-inference-wiki (llm-inference), alongside the kv-cache / MoE efficiency story. Distribution formats (GGUF, compressed tensors) ride along with it.

gemma-4-qat · gemma-4 · open-weight-models · google · llm-inference