ワークショップ (国際) Generalized Weighted-Prediction-Error Dereverberation with Varying Source Priors for Reverberant Speech Recognition
Toru Taniguchi, Aswin Shanmugam Subramanian (Johns Hopkins Univ.), Xiaofei Wang (Johns Hopkins Univ.), Dung Tran, Yuya Fujita, and Shinji Watanabe(Johns Hopkins Univ.)
2019 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA 2019)
Weighted-prediction-error (WPE) is one of the well-known dereverberation signal processing methods especially for alleviating degradation of performance of automatic speech recognition (ASR) in a distant speaker scenario. WPE usually assumes that desired source signals always follow predefined specific source priors such as Gaussian with time-varying variances (TVG). Although based on this assumption WPE works well in practice, generally proper priors depend on sources, and they cannot be known in advance of the processing. On-demand estimation of source priors e.g. according to each utterance is thus required. For this purpose, we extend WPE by introducing a complex-valued generalized Gaussian (CGG) prior and its shape parameter estimator inside of processing to deal with a variety of super-Gaussian sources depending on sources. Blind estimation of the shape parameter of priors is realized by adding a shape parameter estimator as a sub-network to WPE-CGG, treated as a differentiable neural network. The sub-network can be trained by backpropagation from the outputs of the whole network using any criteria such as signal-level mean square error or even ASR errors if the WPE-CGG computational graph is connected to that of the ASR network. Experimental results show that the proposed method outperforms conventional baseline methods with the TVG prior without careful setting of the shape parameter value during evaluation.