The role of lexical resources in CJK natural language processing

Authors:
Jack Halpern
Affiliations:
The CJK Dictionary Institute (CJKI), Niiza-shi, Saitama, Japan
Venue:
MLRI '06 Proceedings of the Workshop on Multilingual Language Resources and Interoperability
Year:
2006

Citing 3
Cited 0

Blending segmentation with tagging in Chinese language corpus processing

COLING '94 Proceedings of the 15th conference on Computational linguistics - Volume 2
Introduction to CKIP Chinese word segmentation system for the first international Chinese Word Segmentation Bakeoff

SIGHAN '03 Proceedings of the second SIGHAN workshop on Chinese language processing - Volume 17
CJKV Information Processing

CJKV Information Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

The role of lexical resources is often understated in NLP research. The complexity of Chinese, Japanese and Korean (CJK) poses special challenges to developers of NLP tools, especially in the area of word segmentation (WS), information retrieval (IR), named entity extraction (NER), and machine translation (MT). These difficulties are exacerbated by the lack of comprehensive lexical resources, especially for proper nouns, and the lack of a standardized orthography, especially in Japanese. This paper summarizes some of the major linguistic issues in the development NLP applications that are dependent on lexical resources, and discusses the central role such resources should play in enhancing the accuracy of NLP tools.