Fullrank: Bayesian Noisy Sorting

· 5 minute read · Discuss on LessWrong

Fullrank is an interactive CLI tool for Bayesian inference of list rankings based on noisy comparisons. It takes a list of items, then efficiently prompts the user to compare pairs of items until the user decides that the posterior distribution is sufficiently low entropy. It can then sample from the resulting posterior distribution and compute various statistics.

Background

Deterministic sorting algorithms rank lists by comparing pairs of items. If an item is greater than another, it is moved higher in the list. However, sometimes it is uncertain which item is greater. For example:

Estimating rankings in the presence of this uncertainty is called noisy sorting. A common approach is to model comparisons between items as depending on a latent numerical value (“skill” or “rating”) for each item. For example, the commonly used Bradley–Terry model assumes that

𝑝(𝑖>𝑗)=𝜎(𝑠𝑖𝑠𝑗)

where 𝑠𝑖 denotes the latent skill of item 𝑖 and 𝜎 is the logistic function.

Motivation

Gwern Branwen’s Resorter is a CLI tool for manual noisy sorting of lists based on the Bradley–Terry model. However, its frequentist approach limits it in a few ways:

As a project to learn more about Bayesian inference, I decided to build a Bayesian version of Resorter.

Thurstonian Model

The Bradley–Terry model is quite nice for maximum-likelihood estimation, but I was unable to get it to work well in a Bayesian setting. Given a normal prior on the skills 𝒔𝒩(𝜇,Σ), the posterior density is

𝑝(𝒔|𝑤>𝑙)=𝜑(𝒔;𝜇,Σ)𝑚𝑖𝜎(𝒔𝑤𝑖𝒔𝑙𝑖)[𝑛𝜑(𝒔;𝜇,Σ)𝑚𝑖𝜎(𝒔𝑤𝑖𝒔𝑙𝑖)𝑑𝒔]1

where 𝜑 denotes the normal density, 𝑚 is the number of comparisons, and 𝑤𝑖 and 𝑙𝑖 are the winning and losing items in comparison 𝑖. It appears some researchers have designed efficient sampling procedures for this posterior, but frankly they are beyond me.

Instead, I used a probability model very similar to Bradley–Terry, but using a probit link instead of a logit link. That is, under the Thurstonian model,

𝑝(𝑖>𝑗)=Φ(𝒔𝑖𝒔𝑗)

where Φ denotes the cumulative distribution function of the standard normal distribution.

I’ll now derive the posterior density in the Thurstonian model. For convenience, I’ll represent the observed comparisons as a matrix 𝐷𝑚×𝑛 mapping score vectors to probits for each comparison. That is, 𝐷𝑖𝑗=1 if item 𝑗 wins comparison 𝑖, 𝐷𝑖𝑗=1 if item 𝑗 loses comparison 𝑖, and 𝐷𝑖𝑗=0 otherwise.

𝑝(𝒔|𝐷)=𝑝(𝒔)𝑝(𝐷|𝒔)𝑝(𝐷)=𝜑(𝒔;𝜇,Σ)Pr[𝐷𝒔<𝒛]Pr[𝐷𝒕<𝒛]

where 𝒕𝒩(𝜇,Σ) and 𝒛𝒩(0,𝐼𝑚).

It turns out that the normalization constant can be represented quite nicely using the multivariate normal CDF Φ𝑚:

Pr[𝐷𝒕<𝒛]=Pr[𝐷(𝒕𝝁)+𝐷𝝁<𝒛]=Pr[𝐷𝝁<𝒛𝐷(𝒕𝝁)]

And since 𝐷(𝒕𝝁)𝒩(0,𝐷Σ𝐷𝑇), we have

𝒛𝐷(𝒕𝝁)𝒩(0,𝐼𝑚+𝐷Σ𝐷𝑇)Pr[𝐷𝒕<𝒛]=Φ𝑚(𝐷𝝁;𝐼𝑚+𝐷Σ𝐷𝑇)

Likewise, Pr[𝐷𝒔<𝒛]=Φ(𝐷𝒔). Therefore,

𝑝(𝒔|𝐷)=𝜑(𝒔;𝜇,Σ)Φ𝑚(𝐷𝒔)[Φ𝑚(𝐷𝝁;𝐼𝑚+𝐷Σ𝐷𝑇)]1

This is called a unified skew-normal (SUN) distribution, and it is the posterior of most probit models. Using the convention of Arrellano-Valle and Azzalini1, we can write

𝒔|𝐷SUN𝑛,𝑚(𝜇,Σ,Σ𝑇𝐷,𝐷𝝁,𝐼𝑚+𝐷Σ𝐷𝑇)

Efficient Sampling

Arrellano-Valle and Azzalini1 also gives us a convolutional representation of the posterior. If

𝒖𝒩(0,ΣΣ𝑇𝐷(𝐼𝑚+𝐷Σ𝐷𝑇)1𝐷𝑇Σ),and𝒗𝒩𝐷𝝁(0,𝐼𝑚+𝐷Σ𝐷𝑇)

where 𝒩𝝉 denotes the normal distribution truncated below 𝝉, then

𝝁+𝒖+Σ𝑇𝐷(𝐼𝑚+𝐷Σ𝐷𝑇)1𝒗SUN𝑛,𝑚(𝝁,Σ,Σ𝑇𝐷,𝐷𝝁,𝐼𝑚+𝐷Σ𝐷𝑇)

Fullrank exploits this fact to efficiently sample from the posterior using samples of 𝒖 and 𝒗.

Optimal Comparison Choice

Ideally, Fullrank should always present the user with the most informative comparison. That is, the comparison whose probit has maximal entropy.

Unified skew-normal distributions are closed under full-rank linear transformations, so each comparison probit is distributed according to a one-dimensional SUN distribution. At least in the case of a standard normal prior, each comparison has identical first and second moments, so intuitively the entropy should be controlled by the skewness.

Fullrank currently assumes that the entropy is decreasing in the 𝐿2 norm of the skewness parameter Δ, which seems to work well in practice. However, I haven’t been able to prove that this works, and it definitely fails for certain non-scalar choices of prior covariance (though these are currently not supported anyway). If you have any better ideas for choosing comparisons, please let me know!

Installation and Usage

# install via pip
pip install git+https://github.com/max-niederman/fullrank.git

# interactively compare items
fullrank compare items.txt > comparisons.json
# infer a ranking distribution and compute statistics
fullrank stats < comparisons.json
# compute raw samples in JSONL format for your own processing
fullrank raw-sample 10000 samples.jsonl < comparisons.json

Footnotes

  1. R. B. Arellano-Valle and A. Azzalini, “Some properties of the unified skew-normal distribution,” Statistical Papers, vol. 63, pp. 461–487, 2022. doi:10.1007/s00362-021-01235-2 2