Defined Term mechanism source ↗ source url updated Tue Jun 09 2026 00:00:00 GMT+0000 (Coordinated Universal Time)

Gradient descent

Gradient descent is “a first-order iterative algorithm for minimizing a differentiable multivariate function” — the canonical gradient-based optimizer, and the deliberate counterpoint to this wiki’s gradient-free metaheuristics (the gap the synthesis open questions named). Source: Wikipedia.

How it works

Step opposite the gradient: xₙ₊₁ = xₙ − η∇f(xₙ) — “repeated steps in the opposite direction of the gradient.” The learning rate η is the knob: “too small … slow convergence … too large … overshoot and divergence” (tuned by line search / backtracking). Requires a differentiable objective and converges only to a local minimum.

Why it matters here — the other half of optimization

It inverts every assumption of the founding corpus: it uses derivatives (metaheuristics are black-box), it’s exploitation-only (no population, no exploration), and it gives no global guarantee off-convexity. The no-free-lunch-theorem cuts both ways: where gradients exist and the landscape is benign, gradient descent crushes any metaheuristic on speed; where it’s black-box/non-differentiable/multimodal, the metaheuristics earn their keep. The workhorse of ML training (see stochastic-gradient-descent).

stochastic-gradient-descent · convex-optimization · metaheuristic-optimization · exploration-vs-exploitation · no-free-lunch-theorem · bayesian-optimization

Gradient descent

How it works

Why it matters here — the other half of optimization

Related

Linked from