Do Extractive Summarization Algorithms Amplify Lexical Bias in News Articles? - Yahoo! JAPAN R&D

Publications

CONFERENCE (INTERNATIONAL) Do Extractive Summarization Algorithms Amplify Lexical Bias in News Articles?

Rei Shimizu (Waseda univ.), Sumio Fujita, Tetsuya Sakai (Waseda univ.)

The 8th ACM SIGIR International Conference on the Theory of Information Retrieval (ICTIR 2022)

July 11, 2022

Users who read news summaries on search engine result pages and social media may not access the original news articles. Hence, if the summaries are automatically generated, it is vital that the automatic summaries represent the contents of the original articles accurately and fairly. The present study is concerned with lexical bias in sentences: a sentence is considered lexically biased if it contains expressions that may strongly influence the reader’s opinion about a topic either positively or negatively. More specifically, we are interested in whether extractive summarizers can amplify lexical bias, by excessively extracting lexically biased sentences from the original article and thus misrepresent it. To address this question, we first introduce the Bias Independence Principle (BIP), which says that the probability that a sentence is selected by an extractive summarizer should be independent of whether the sentence is lexically biased or not. Based on the BIP, we propose an evaluation measure for extractive summarizers called the Bias Independence Criterion (BIC), which compares the distribution of the sentence scores for lexically biased sentences and that of the sentence scores for non-biased sentences. Moreover, based on the BIC, we define another measure called the Summary Feature Permutation Importance (SFPI) to examine whether a particular feature used by a feature-based extractive summarizer is responsible for amplifying lexical bias. Our experimental results suggest that a) Different extractive summarizers can amplify lexical bias to different degrees; b) The features useful for extracting informative sentences may also be responsible for amplifying lexical bias; and c) as mean ROUGE scores increase (implying higher informativeness), mean BIC scores also tend to increase (implying a higher concentration of lexically biased sentences).

Paper : Do Extractive Summarization Algorithms Amplify Lexical Bias in News Articles? (external link)