カンファレンス (国際) Coupled Hierarchical Dirichlet Process Mixtures for Simultaneous Clustering and Topic Modeling
Masamichi Shimosaka(Tokyo Institute of Technology), Takeshi Tsukiji(The University of Tokyo), Shoji Tominaga(The University of Tokyo) and Kota Tsubouchi
The European Conference on Machine Learning and Principles and Practice of Knowledge Discovery (ECML PKDD 2016)
We propose a nonparametric Bayesian mixture model that simultaneously optimizes the topic extraction and group clustering while allowing all topics to be shared by all clusters for grouped data. In addition, in order to enhance the computational efficiency on par with today’s large-scale data, we formulate our model so that it can use a closed-form variational Bayesian method to approximately calculate the posterior distribution. Experimental results with corpus data show that our model has a better performance than existing models, achieving a 22 % improvement against state-of-the-art model. Moreover, an experiment with location data from mobile phones shows that our model performs well in the field of big data analysis.