Spokes.wiki Search Graph Growth About

llm-inference-wiki

Tech Article source ↗ source url updated Mon Jun 01 2026 00:00:00 GMT+0000 (Coordinated Universal Time)

How LLMs Choose Their Words: Logits, Softmax and Sampling

A practical, code-first walk-through (MachineLearningMastery, intermediate level) of how an LLM turns its raw output into an actual next token — the final step of llm-inference, the token-sampling stage.

What it covers

Concrete example

Uses the prompt “Today’s weather is so ___” over a toy 6-token vocabulary, with PyTorch code for each strategy and plots of how the probability distribution reshapes under each.

Takeaway

“Sampling strategies shape the model’s next-token distribution.” The knobs trade consistency (low temperature, top-k = 1) against creativity/diversity (higher temperature, higher top-p). See token-sampling for the concept page. (Caveat: tutorial-level toy example, not a benchmark.)