class: center, middle, inverse, title-slide # Analyzing Preference Data with the BayesMallows Package ### Øystein Sørensen ### University of Oslo ### ERUM2020 --- # Ranks and Preferences ### Data - Which ad did the user click? - Which product did the user buy? -- ### Action - What should be recommended to the customer? --- # Probabilistic Recommendations - Quantifying uncertainty about predicted preferences creates a better decision making tool - Avoids spamming - May change the optimal recommendation -- - Mallows' rank model - `\(\rho\)` is the latent (mean) ranking of the customer's segment - `\(r\)` is the observed rankings `$$P\left(r | \alpha, \rho\right) \propto \exp\left[\frac{ -\alpha d(r, \rho)}{ n}\right] 1_{\mathcal{P}_{n}(r)}$$` --- # Bayesian Mallows model - Fully probabilistic framework - Clustering of users with similar preferences - Preferences in the form of rankings, choices - Handles inconsistent rankings - Predicts preferences based on sparse data -- ```r library(BayesMallows) ``` - Heavy computations implemented with `Rcpp` and `RcppArmadillo` - Metropolis-Hastings - Importance sampling - Permutation distances --- # Example Dataset - 60 users asked for their preference in 25 randomly chosen pairs of beaches <center> <img src="figures/beaches.png", height="400"> </center> --- # Compute Posterior Distribution ```r fit <- compute_mallows( preferences = beach_preferences, nmc = 20000, save_aug = TRUE, verbose = TRUE ) #> Generating transitive closure of preferences. #> Generating initial ranking. #> First 1000 iterations of Metropolis-Hastings algorithm completed. #> First 2000 iterations of Metropolis-Hastings algorithm completed. #> ... #> First 19000 iterations of Metropolis-Hastings algorithm completed. #> [1] "Metropolis-Hastings algorithm completed. Post-processing data." ``` --- # MCMC Diagnostics ```r assess_convergence(fit, parameter = "rho", items = 1:5) ``` <center> <img src="figures/diagnostics.png" height="350"> </center> --- # Posterior Distributions ```r plot(fit, burnin = 10000, type = "alpha") ``` <center> <img src="figures/posterior.png" height="350"> </center> --- # Predict Preferences - For each beach, what is the probability that it is among the user's top-3? - Reminder: Each user ranked only a subset ```r plot_top_k(fit, k = 3, burnin = 10000) ``` <center> <img src="figures/top3.png" height="350"> </center> --- # More Applications and Methodology -- - **Original method**: Vitelli, Sørensen, Crispino, Frigessi, Arjas, *Journal of Machine Learning Research* (2017), http://jmlr.org/papers/v18/15-481.html - **Inconsistent preferences**: Crispino, Arjas, Vitelli, Barrett, Frigessi, *Annals of Applied Statistics* (2019), https://doi.org/doi:10.1214/18-AOAS1203 - **BayesMallows package**: Sørensen, Crispino, Liu, Vitelli, *R Journal* (2020), https://arxiv.org/abs/1902.08432 - **Review paper**: Liu, Crispino, Scheel, Vitelli, Frigessi, *Annual Review of Statistics and Its Application* (2019), https://doi.org/10.1146/annurev-statistics-031017-100213 - **Clicking data**: Liu, Reiner, Frigessi, Scheel, *Knowledge-Based Systems* (2019), https://doi.org/10.1016/j.knosys.2019.104960 - **Time series**: Asfaw, Vitelli, Sørensen, Arjas, Frigessi, *Stat* (2016), https://doi.org/10.1002/sta4.132 - **Musicology(!!)**: Barrett, Crispino *Journal of New Music Research* (2018), https://doi.org/10.1080/09298215.2018.1437187 --- # Thank you! [![CRAN\_Status\_Badge](http://www.r-pkg.org/badges/version/BayesMallows)](https://cran.r-project.org/package=BayesMallows) [![Build Status](https://travis-ci.org/ocbe-uio/BayesMallows.svg?branch=master)](https://travis-ci.org/ocbe-uio/BayesMallows) [![codecov](https://codecov.io/gh/ocbe-uio/BayesMallows/branch/master/graph/badge.svg)](https://codecov.io/gh/ocbe-uio/BayesMallows) - Download `BayesMallows` from CRAN ```r install.packages("BayesMallows") ``` - Feedback is much appreciated! - https://github.com/ocbe-uio/BayesMallows - Slides will be posted at http://osorensen.rbind.io/