Gradient descent
Gradient descent is “a first-order iterative algorithm for minimizing a differentiable multivariate function” — the canonical gradient-based optimizer, and the deliberate counterpoint to this wiki’s gradient-free metaheuristics (the gap the synthesis open questions named). Source: Wikipedia.
How it works
Step opposite the gradient: xₙ₊₁ = xₙ − η∇f(xₙ) — “repeated steps in the opposite direction of the
gradient.” The learning rate η is the knob: “too small … slow convergence … too large … overshoot
and divergence” (tuned by line search / backtracking). Requires a differentiable objective and
converges only to a local minimum.
Why it matters here — the other half of optimization
It inverts every assumption of the founding corpus: it uses derivatives (metaheuristics are black-box), it’s exploitation-only (no population, no exploration), and it gives no global guarantee off-convexity. The no-free-lunch-theorem cuts both ways: where gradients exist and the landscape is benign, gradient descent crushes any metaheuristic on speed; where it’s black-box/non-differentiable/multimodal, the metaheuristics earn their keep. The workhorse of ML training (see stochastic-gradient-descent).
Related
stochastic-gradient-descent · convex-optimization · metaheuristic-optimization · exploration-vs-exploitation · no-free-lunch-theorem · bayesian-optimization