Speaker Selective Beamformer with Keyword Mask Estimation - Yahoo! JAPAN R&D

Publications

WORKSHOP (INTERNATIONAL) Speaker Selective Beamformer with Keyword Mask Estimation

Yusuke Kida, Dung Tran, Motoi Omachi, Toru Taniguchi, and Yuya Fujita

2018 IEEE Workshop on Spoken Language Technology (SLT 2018　)

December 18, 2018

This paper addresses the problem of automatic speech recog- nition (ASR) of a target speaker in background speech. The novelty of our approach is that we focus on a wakeup key- word, which is usually used for activating ASR systems like smart speakers. The proposed method firstly utilizes a DNN- based mask estimator to separate the mixture signal into the keyword signal uttered by the target speaker and the remain- ing background speech. Then the separated signals are used for calculating a beamforming filter to enhance the subse- quent utterances from the target speaker. Experimental evalu- ations show that the trained DNN-based mask can selectively separate the keyword and background speech from the mix- ture signal. The effectiveness of the proposed method is also verified with Japanese ASR experiments, and we confirm that the character error rates are significantly improved by the pro- posed method for both simulated and real recorded test sets.

Paper : Speaker Selective Beamformer with Keyword Mask Estimation (external link)