カンファレンス (国際) Improvements in Japanese Voice Search

Ken-ichi Iso, Edward Whittaker(Inferret Ltd.), Tadashi Emori, Jumpei Miyake

Annual Conference of the International Speech Communication Association (InterSpeech 2012)


This paper describes work on Japanese voice-search at Yahoo! Japan. We first describe several implementation details of our WFST-based internal decoder which make the voice-search task more efficient including a simple, but effective, compressed WFST arc representation. This permits a ~ 2Gb memory decoder process for a 1 million word vocabulary and 35 million N-gram language model. We then describe our baseline system using the decoder and compare it against two open-source decoders, Juicer and Julius. We also describe our initial attempts to adapt the baseline system through simple language model adaptation using manually transcribed anonymized voice queries. To achieve this we present a sequence of WFST operations which preserve consistency of segmentation between manual and automatic transcriptions. We show that even using this simple adaptation method we obtain a relative reduction of up to 4.6% in sentence error rate and 8.2% in character error rate.