Methodology
How the rankings are built
This page documents how the Top 100 list is constructed, what's in the data, and what's deliberately out. The Riemann-hypothesis ranking uses the same three-source composite design as the sister Goldbach site (arXiv preprint output, OpenAlex topical citations, and zbMATH MSC classifications). The Riemann hypothesis is a heavily arXiv-based field, so the current ranking is built mainly from the arXiv signal, with OpenAlex contributing where it overlaps; the zbMATH MSC layer is being integrated.
Data sources
| Source | What it gives | Limitations |
|---|---|---|
| arXiv (math.NT) | Preprint-level: titles, abstracts, authors, dates, co-author graph | Biased toward people who post preprints. Senior figures who publish only in journals are undercounted. |
| OpenAlex | Author-level: paper count, citations, affiliations, country | Concept tagging is noisy in math; surname-only matching can misidentify. The phrase random matrix pulls in wireless and signal-processing engineers, so it is excluded from the OpenAlex queries. |
| zbMATH Open | Curated math review database; canonical author codes; editor-assigned MSC classification (we use the two Riemann-hypothesis-core classes) | Coverage of older non-Western mathematicians is the best of the three sources; the REST API is gated behind a one-time Terms-of-Use acceptance. This layer is being folded into the ranking. |
Pipeline
Title-weighting. A paper can mention the Riemann hypothesis without being about it, for example a paper that cites it in its introduction as a famous open problem. To separate genuine work from passing mentions, the arXiv and OpenAlex pipelines weight a keyword match by where it appears: a match in the paper title counts at full weight, and a match only in the abstract counts at half (a factor of 0.5). zbMATH is not title-weighted, because its documents are classified by human editors, so the subject class itself is the relevance signal.
- arXiv pull: 15 search terms (Riemann hypothesis, Riemann zeta function, Dirichlet L-functions, critical line, critical strip, pair correlation, zero-density, zeros of the zeta function, nontrivial zeros, Selberg class, Mertens function, moments of the zeta function, Lindelof, Beurling-Nyman, and random matrix) restricted to the math.NT category. Each paper's contribution to an author is title-weighted as above. A co-authorship graph is built and eigenvector centrality is the second factor in an arXiv composite of
0.60 * pr(weighted papers) + 0.40 * pr(eigen). Authors with at least 3 topical papers qualify. - OpenAlex pull: 14 phrase queries (the arXiv terms minus
random matrix, which without a category guard floods the results with wireless and random-matrix-theory engineers), with an author cap of 10 per work to remove physics megapapers. Works and their citations are title-weighted as above. Composite:0.60 * pr(weighted works) + 0.40 * pr(weighted citations). Result: 137 qualifying authors. Because the Riemann hypothesis is so strongly an arXiv-preprint field, OpenAlex overlaps with only a handful of the top-ranked researchers directly; the others are carried by their arXiv signal. - zbMATH pull: documents tagged with either of the two Riemann-hypothesis-core MSC classes, 11M26 (zeros of zeta and L-functions and the Riemann hypothesis) or 11M50 (relations of the zeta function with random matrices and physics). The editor-assigned MSC classes correct a systematic gap in the other sources: pre-1995 number theorists and specialists who publish in journals with sparse arXiv presence. This pull is in progress and is being folded into the merged ranking.
- Merge and scoring: the rankings are surname-deduplicated and joined. The available ranks are combined with a weighted order statistic: each researcher's ranks are sorted and weighted
0.70on the best,0.20on the middle, and0.10on the worst. Sorting before weighting means the method rewards excellence in any one Riemann-hypothesis pipeline (a researcher who is top in zbMATH but absent from arXiv would still score well), while a researcher strong across all of them still finishes ahead. Lower combined score ranks higher. An earlier design simply summed the ranks, which punished anyone outstanding in one source but weak in another; the weighted order statistic fixes that. - Estimating a missing rank (interpolation): a researcher ranked by only one of the pipelines is not given a flat penalty. To estimate a missing rank, we order the whole pool by a pipeline the researcher does appear in, then walk outward to the two nearest researchers above and the two nearest below who carry a real rank in the missing pipeline, and average those (up to four) values. One rule protects the scoring: the
0.70top weight may only land on a measured rank, so an estimate can support a researcher's score but can never be their headline signal. Estimated ranks show in [square brackets] on the Top 100 table; measured ranks show plain. - Hand-curated edits: an exclusions file removes researchers the automated pipeline surfaced in error (see Audit decisions). The merge does not hand-place any researcher; everyone earns their rank from the pipeline scores.
Audit decisions
Excluded
A small number of authors surfaced by the automated pipeline are removed by hand. Some work in unrelated fields, for example signal processing, wireless communications, or coding theory, and were pulled in by surname collisions or by the noisy random matrix topic before it was dropped from the OpenAlex queries. A few others are self-published authors whose output is not part of mainstream research. The specific names are kept internal: listing them here would only give them visibility, which is the opposite of the point.
What's not in this list
- Researchers without a strong digital footprint. The pipeline indexes arXiv well and OpenAlex moderately, so figures who publish mainly in journals are undercounted until the zbMATH MSC layer is fully integrated.
- Subjective importance. A theorist whose entire body of Riemann-hypothesis work is one influential paper may rank lower than a productive researcher with many adjacent papers. We rank by output, not by depth.
- Adjacent topics. The list covers the Riemann hypothesis and adjacent problems, so some of the 100 work mainly on related questions (pair correlation of zeros, moments of the zeta function, Dirichlet L-functions, the Selberg class, the random-matrix connection) rather than on the Riemann hypothesis directly. Title-weighting reduces, but does not eliminate, the appearance of researchers whose connection to the hypothesis itself is incidental.