Spokes.wiki Search Graph Growth About

optimization-algorithms-wiki

Defined Term mechanism source ↗ source url updated Tue Jun 09 2026 00:00:00 GMT+0000 (Coordinated Universal Time)

Stochastic gradient descent (SGD) & Adam

SGD is gradient-descent that “replaces the actual gradient … by an estimate … calculated from a randomly selected subset of the data” — the workhorse of deep-learning training, trading exactness for “faster iterations.” Its adaptive descendants are the most-used optimizers on Earth. Source: Wikipedia.

The lineage

Why it matters here

This is the optimizer the rest of the AI stack actually runs on — the bridge to ../llm-providers-wiki / ../llm-inference-wiki (training the very models elsewhere in the hub). It also reframes exploration-vs-exploitation: SGD’s gradient noise is a cheap, implicit exploration that helps escape sharp minima — a different answer than a population. Still local, still needs gradients (cf. the black-box metaheuristics and model-based bayesian-optimization).

gradient-descent · convex-optimization · bayesian-optimization · metaheuristic-optimization · exploration-vs-exploitation