CONFERENCE (INTERNATIONAL) Article De-duplication Using Distributed Representations

Shumpei Okura, Yukihiro Tagami and Akira Tajima

The 25th International Conference on World Wide Web (Posters) <Best Poster Runner-up> (WWW 2016)

April 11, 2016

In news recommendation systems, eliminating redundant information is important as well as providing interesting articles for users. We propose a method that quantifies the similarity of articles based on their distributed representation, learned with the category information as weak supervision. This method is useful for evaluation under tight time constraints, since it only requires low-dimensional inner product calculation for estimating similarities. The experimental results from human evaluation and online performance in A/B testing suggest the effectiveness of our proposed method, especially for quantifying middle-level similarities. Currently, this method is used on Yahoo! JAPAN’s front page, which has millions of users per day and billions of page views per month.

PDF : Article De-duplication Using Distributed Representations