カンファレンス (国際) Rapid Development of a Corpus with Discourse Annotations using Two-stage Crowdsourcing

Daisuke Kawahara (Kyoto University), Yuichiro Machida (Kyoto University), Tomohide Shibata (Kyoto University), Sadao Kurohashi (Kyoto University), Hayato Kobayashi, Manabu Sassano

The 25th International Conference on Computational Linguistics (COLING 2014)


We present a novel approach for rapidly developing a corpus with discourse annotations using crowdsourcing. Although discourse annotations typically require much time and cost owing to their complex nature, we realize discourse annotations in an extremely short time while retaining good quality of the annotations by crowdsourcing two annotation subtasks. In fact, our experiment to create a corpus comprising 30,000 Japanese sentences took less than eight hours to run. Based on this corpus, we also develop a supervised discourse parser and evaluate its performance to verify the usefulness of the acquired corpus.