カンファレンス (国際) Summarization Based on Embedding Distributions

Hayato Kobayashi, Masaki Noguchi, Taichi Yatsuka

the 2015 Conference on Empirical Methods in Natural Language Processing (EMNLP 2015)


In this study, we consider a summarization method using the document level similarity based on embeddings, or distributed representations of words, where we assume that an embedding of each word can represent its “meaning.” We formalize our task as the problem of maximizing a submodular function defined by the negative summation of the nearest neighbors’ distances on embedding distributions, each of which represents a set of word embeddings in a document. We proved the submodularity of our objective function and that our problem is asymptotically related to the KL-divergence between the probability density functions that correspond to a document and its summary in a continuous space. An experiment using a real dataset demonstrated that our method performed better than the existing method based on sentence-level similarity.

Poster Download (700KB)

PDF : Summarization Based on Embedding Distributions