Chinese unknown word identification using character-based tagging and chunking

Authors:
Goh Chooi Ling;Masayuki Asahara;Yuji Matsumoto
Affiliations:
Nara Institute of Science and Technology;Nara Institute of Science and Technology;Nara Institute of Science and Technology
Venue:
ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 2
Year:
2003

Citing 3
Cited 9

Unknown word extraction for Chinese documents

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Chunking with support vector machines

NAACL '01 Proceedings of the second meeting of the North American Chapter of the Association for Computational Linguistics on Language technologies
Automatic recognition of Chinese unknown words based on roles tagging

SIGHAN '02 Proceedings of the first SIGHAN workshop on Chinese language processing - Volume 18

Japanese unknown word identification by character-based chunking

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Recognize person names from Chinese texts based on clustering SVM

ISC '07 Proceedings of the 10th IASTED International Conference on Intelligent Systems and Control
An improved fast algorithm of frequent string extracting with no thesaurus

MICAI'07 Proceedings of the artificial intelligence 6th Mexican international conference on Advances in artificial intelligence
Efficient appointment information extraction from short messages in mobile devices with limited hardware resources

Pattern Recognition Letters
Chinese new word identification: a latent discriminative model with global features

Journal of Computer Science and Technology - Special issue on natural language processing
High speed unknown word prediction using support vector machine for chinese text-to-speech systems

IJCNLP'04 Proceedings of the First international joint conference on Natural Language Processing
A lexicon-constrained character model for chinese morphological analysis

IJCNLP'05 Proceedings of the Second international joint conference on Natural Language Processing
A new method to compose long unknown Chinese keywords

Journal of Information Science
Fast online training with frequency-adaptive learning rates for Chinese word segmentation and new word detection

ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1

Quantified Score

Hi-index	0.00

Visualization

Abstract

Since written Chinese has no space to delimit words, segmenting Chinese texts becomes an essential task. During this task, the problem of unknown word occurs. It is impossible to register all words in a dictionary as new words can always be created by combining characters. We propose a unified solution to detect unknown words in Chinese texts. First, a morphological analysis is done to obtain initial segmentation and POS tags and then a chunker is used to detect unknown words.