カンファレンス (国際) What Rankers Can be Statistically Distinguished in Multileaved Comparisons?
Makoto P. Kato (University of Tsukuba), Akiomi Nishida, Tomohiro Manabe, Sumio Fujita, Takehiro Yamamoto (University of Tsukuba)
The 29th ACM International Conference on Information and Knowledge Management (CIKM2020)
This paper presents findings from an empirical study of multileaved comparisons, an efficient online evaluation methodology, in a commercial Web service. The most important difference from the previous studies is the number of rankers involved in the online evaluation: we compared 30 rankers for around 90 days by multileaved comparisons. A relatively large number of rankers answered several questions that could not be addressed in the previous work due to a small number of rankers: How much ranking difference is required for rankers to be statistically distinguished? How many impressions are necessary for finding statistically significant differences for correlated rankers? How large difference in offline evaluation can predict significant differences in a multileaved comparison? We answer these questions with the results of the multileaved comparisons, and generalized some of the findings by simulation-based experiments.