カンファレンス (国際) Deterministic Word Segmentation Using Maximum Matching with Fully Lexicalized Rules

Manabu Sassano

Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics , volume 2: Short Papers (EACL 2014)


We present a fast algorithm of word segmentation that scans an input sentence in a deterministic manner just one time. The algorithm is based on simple maximum matching which includes execution of fully lexicalized transformational rules. Since the process of rule matching is incorporated into dictionary lookup, fast segmentation is achieved. We evaluated the proposed method on word segmentation of Japanese. Experimental results show that our segmenter runs considerably faster than the state-of-the-art systems and yields a practical accuracy when a more accurate segmenter or an annotated corpus is available.