Segmentation standard for Chinese natural language processing

Authors:
Chu-Ren Huang;Keh-Jiann Chen;Li-Li Chang
Affiliations:
Institute of History and Philology, Nankang, Taipei, Taiwan;Institute of Information Science Academia Sinica, Nankang, Taipei, Taiwan;Institute of Information Science Academia Sinica, Nankang, Taipei, Taiwan
Venue:
COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 2
Year:
1996

Citing 1
Cited 1

Word identification for Mandarin Chinese sentences

COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 1

A bottom-up merging algorithm for Chinese unknown word extraction

SIGHAN '03 Proceedings of the second SIGHAN workshop on Chinese language processing - Volume 17

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper proposes a segmentation standard for Chinese natural language processing. The standard is proposed to achieve linguistic felicity, computational feasibility, and data uniformity. Linguistic felicity is maintained by defining a segmentation unit to be equivalent to the theoretical definition of word, and by providing a set of segmentation principles that are equivalent to a functional definition of a word. Computational feasibility is ensured by the fact that the above functional definitions are procedural in nature and can be converted to segmentation algorithms, as well as by the implementable heuristic guidelines which deal with specific linguistic categories. Data uniformity is achieved by stratification of the standard itself and by defining a standard lexicon as part of the segmentation standard.