CONFERENCE (INTERNATIONAL) Article De-duplication Using Distributed Representations
The 25th International Conference on World Wide Web (Posters) <Best Poster Runner-up> (WWW 2016)
April 11, 2016
In news recommendation systems, eliminating redundant information is important as well as providing interesting articles for users. We propose a method that quantifies the similarity of articles based on their distributed representation, learned with the category information as weak supervision. This method is useful for evaluation under tight time constraints, since it only requires low-dimensional inner product calculation for estimating similarities. The experimental results from human evaluation and online performance in A/B testing suggest the effectiveness of our proposed method, especially for quantifying middle-level similarities. Currently, this method is used on Yahoo! JAPAN’s front page, which has millions of users per day and billions of page views per month.