Automatic comma insertion for Japanese text generation

Authors:
Masaki Murata;Tomohiro Ohno;Shigeki Matsubara
Affiliations:
Nagoya University, Japan;Nagoya University, Japan;Nagoya University, Japan
Venue:
EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Year:
2010

Citing 4
Cited 0

A three-level revision model for improving Japanese bad-styled expressions

COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 2
A more precise analysis of punctuation for broad-coverage surface realization with CCG

GEAF '08 Proceedings of the Workshop on Grammar Engineering Across Frameworks
A Linguistically Inspired Statistical Model for Chinese Punctuation Generation

ACM Transactions on Asian Language Information Processing (TALIP)
Enriching speech recognition with automatic detection of sentence boundaries and disfluencies

IEEE Transactions on Audio, Speech, and Language Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper proposes a method for automatically inserting commas into Japanese texts. In Japanese sentences, commas play an important role in explicitly separating the constituents, such as words and phrases, of a sentence. The method can be used as an elemental technology for natural language generation such as speech recognition and machine translation, or in writing-support tools for non-native speakers. We categorized the usages of commas and investigated the appearance tendency of each category. In this method, the positions where commas should be inserted are decided based on a machine learning approach. We conducted a comma insertion experiment using a text corpus and confirmed the effectiveness of our method.