No Free Lunch Theorems for Optimization (Wolpert & Macready, 1997)
The canonical primary source for the no-free-lunch-theorem this entire spoke is organized around. David H. Wolpert and William G. Macready, “No Free Lunch Theorems for Optimization,” IEEE Transactions on Evolutionary Computation 1(1): 67–82, 1997. Until now the wiki grounded its central thesis in Andrey Dik’s MQL5 benchmark blog (T3, single author); this page anchors it in the original result.
The framework
An optimization algorithm is modelled as a mapping that, given the sequence of points already
visited and their cost values, chooses the next point to sample — deterministic and non-repeating
(it never re-visits a point). Crucially, performance is measured only on the sequence of cost
values observed (the d^y_m sample after m evaluations), not on where in the domain they were
found. Spaces X and Y are finite. This abstraction is what makes the theorem hold for any
performance metric defined on the value sequence (best-so-far, mean, etc.).
The core result (Theorem 1)
Averaged uniformly over all possible cost functions f: X → Y, the probability of obtaining any
particular sequence of cost values is identical for any two algorithms a₁, a₂:
∑_f P(d^y_m | f, m, a₁) = ∑_f P(d^y_m | f, m, a₂)
So when every objective function is equally likely, no algorithm’s expected performance beats any other’s — including blind random search. The immediate corollary: any algorithm that does better than average on one class of problems pays for it with exactly offsetting inferiority on the complementary class. There is no universally superior optimizer.
Beyond the headline
- Time-varying objectives. A second, “more subtle” NFL result holds when the cost function changes during the search — the free lunch is unavailable there too.
- Alignment / geometric interpretation. Wolpert & Macready frame an algorithm’s average
performance as an inner product (alignment) between the algorithm and the problem distribution
P(f). Better-than-random performance comes not from sophistication but from matching the algorithm to the actual distribution of problems you face. - Head-to-head minimax. Equal averages do not mean two algorithms are interchangeable on specific problem pairs — their minimax (worst-case head-to-head) distinctions can differ. NFL constrains averages, not every comparison.
Why it grounds the spoke
This is the formal backbone under the wiki’s standing rule — record every ranking with its benchmark context; “best optimizer” is benchmark-relative. Dik’s empirical suite (a metaphor-free 1970s (μ+λ)-ES leading; PSO below random search; cma-es only ~38/45) is NFL observed in the wild, and the theorem says exactly why reputation can’t predict rank. It also sharpens the practical caveat the concept page already carries: NFL assumes a uniform distribution over all functions, which real problems violate — so structure- exploiting methods genuinely win on realistic classes. The theorem is a caution against universal claims, not against optimization itself: improvement requires “advance knowledge” of the problem distribution, which is precisely what model-based and structure-exploiting (convex-optimization) methods encode.
Tier
T1 — the peer-reviewed primary paper (IEEE TEC), the authoritative origin of the result. Replaces reliance on the T3 Dik blog for the theorem itself (the blog remains the source for its empirical illustration). Formal statements here are corroborated against the Wikipedia NFL article; the IEEE PDF did not machine-extract, so the summary draws on the well-established canonical formulation.
Related
no-free-lunch-theorem · population-optimization-benchmark · metaheuristic-optimization · cma-es · evolution-strategies · bayesian-optimization · test-functions-for-optimization