Notes on Regularization
In some cases, e.g., if the data is sparse, the iterative algorithms underlying the parameter inference functions might not converge. A pragmatic solution to this problem is to add a little bit of regularization.
Inference functions in choix
provide a generic regularization argument:
alpha
. When \(\alpha = 0\), regularization is turned off; setting
\(\alpha > 0\) turns it on. In practice, if regularization is needed, we
recommend starting with small values (e.g., \(10^{-4}\)) and increasing the
value if necessary.
Below, we briefly how the regularization parameter is used inside the various parameter inference functions.
Markov-chain based algorithms
For Markov-chain based algorithms such Luce Spectral Ranking and Rank Centrality, \(\alpha\) is used to initialize the transition rates of the Markov chain.
In the special case of pairwise-comparison data, this can be loosely understood as placing an independent Beta prior for each pair of items on the respective comparison outcome probability.
Minorization-maximization algorithms
In the case of Minorization-maximization algorithms, the exponentiated model parameters \(e^{\theta_1}, \ldots, e^{\theta_n}\) are endowed each with an independent Gamma prior distribution, with scale \(\alpha + 1\). See Caron & Doucet (2012) for details.
Other algorithms
The scipy-based optimization functions use an \(\ell_2\)-regularizer on the parameters \(\theta_1, \ldots, \theta_n\). In other words, the parameters are endowed each with an independent Gaussian prior with variance \(1 / \alpha\).