CONFERENCE (INTERNATIONAL) 3-step Parallel Corpus Cleaning using Monolingual Crowd

Toshiaki Nakazawa (Kyoto University), Sadao Kurohashi (Kyoto University), Hayato Kobayashi, Hiroki Ishikawa, and Manabu Sassano

The 2015 Conference of the Pacific Association for Computational Linguistics (PACLING 2015)

May 19, 2015

A high-quality parallel corpus needs to be manually created to achieve good machine translation for the domains which do not have enough existing resources. Although the quality of the corpus to some extent can be improved by asking the professional translators to translate, it is impossible to completely avoid making any mistakes. In this paper, we propose a framework for cleaning the existing professionally-translated parallel corpus in a quick and cheap way. The proposed method uses a 3-step crowdsourcing procedure to efficiently detect and edit the translation flaws, and also guarantees the reliability of the edits. The experiments using the fashion-domain e-commerce-site (EC-site) parallel corpus show the effectiveness of the proposed method for the parallel corpus cleaning.

Paper : 3-step Parallel Corpus Cleaning using Monolingual Crowd (external link)