Tech Article source ↗ source url updated Mon Jun 01 2026 00:00:00 GMT+0000 (Coordinated Universal Time)

How LLMs Choose Their Words: Logits, Softmax and Sampling

A practical, code-first walk-through (MachineLearningMastery, intermediate level) of how an LLM turns its raw output into an actual next token — the final step of llm-inference, the token-sampling stage.

What it covers

Logits → probabilities. The model emits a raw score (a logit) per vocabulary token; softmax normalizes these into a probability distribution that sums to 1.
Temperature. A scaling knob applied before softmax: < 1 sharpens the distribution toward the top tokens (more deterministic); > 1 flattens it (more random).
Top-k sampling. Keep only the k highest-probability tokens, renormalize, sample. k = 1 degenerates to greedy decoding.
Top-p (nucleus) sampling. Keep the smallest set of tokens whose cumulative probability reaches p; the cutoff adapts to the model’s confidence rather than a fixed count.

Concrete example

Uses the prompt “Today’s weather is so ___” over a toy 6-token vocabulary, with PyTorch code for each strategy and plots of how the probability distribution reshapes under each.

Takeaway

“Sampling strategies shape the model’s next-token distribution.” The knobs trade consistency (low temperature, top-k = 1) against creativity/diversity (higher temperature, higher top-p). See token-sampling for the concept page. (Caveat: tutorial-level toy example, not a benchmark.)

How LLMs Choose Their Words: Logits, Softmax and Sampling

What it covers

Concrete example

Takeaway

Linked from