Towards Adaptive Off-Policy Evaluation of Ranking Policies under Agnostic and Stochastic Behavior Models - Yahoo! JAPANの研究開発

Publications

カンファレンス (国際) Towards Adaptive Off-Policy Evaluation of Ranking Policies under Agnostic and Stochastic Behavior Models

Haruka Kiyohara (Tokyo Institute of Technology), Nobuyuki Shimizu, Yasuo Yamamoto

The Undergraduate Consortium at SIGKDD 2022 (UC at SIGKDD 2022)

2022.8.14

In many real-world recommender and search systems, presenting a ranked list of relevant items is crucial for increasing user engage- ments or revenue. Off-Policy Evaluation (OPE) for ranking policies is gaining a growing attention, as it enables offline evaluation of new policies using only logged data. Inverse Propensity Scoring (IPS) is a prevalent approach in (general) OPE. Unfortunately, the naive application of IPS in the ranking setting often faces critical vari- ance issues due to the combinatorially large action space and the resulting huge importance weight. To reduce the variance, existing estimators introduce some user behavior assumptions to eliminate the unnecessary importance weight. However, a strong assumption may in turn incur serious bias, making “assumption selection” a challenging problem. To tackle this issue, we propose the Adaptive IPS estimator, which interpolates among the existing estimators. AIPS do this by using a class of importance weights that include those of existing estimators. By tuning the interpolation hyper- parameters of the importance weight in a data-driven way, the proposed estimator adaptively reduces the varaince of IPS without incurring high bias. The empirical results demonstrate that the proposed estimator works reliably across a range of user behavior models including the stochastic ones.

Paper : Towards Adaptive Off-Policy Evaluation of Ranking Policies under Agnostic and Stochastic Behavior Models （外部サイト）