Unknown word extraction for Chinese documents
COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Chunking with support vector machines
NAACL '01 Proceedings of the second meeting of the North American Chapter of the Association for Computational Linguistics on Language technologies
Automatic recognition of Chinese unknown words based on roles tagging
SIGHAN '02 Proceedings of the first SIGHAN workshop on Chinese language processing - Volume 18
Japanese unknown word identification by character-based chunking
COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Recognize person names from Chinese texts based on clustering SVM
ISC '07 Proceedings of the 10th IASTED International Conference on Intelligent Systems and Control
An improved fast algorithm of frequent string extracting with no thesaurus
MICAI'07 Proceedings of the artificial intelligence 6th Mexican international conference on Advances in artificial intelligence
Chinese new word identification: a latent discriminative model with global features
Journal of Computer Science and Technology - Special issue on natural language processing
High speed unknown word prediction using support vector machine for chinese text-to-speech systems
IJCNLP'04 Proceedings of the First international joint conference on Natural Language Processing
A lexicon-constrained character model for chinese morphological analysis
IJCNLP'05 Proceedings of the Second international joint conference on Natural Language Processing
A new method to compose long unknown Chinese keywords
Journal of Information Science
ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
Hi-index | 0.00 |
Since written Chinese has no space to delimit words, segmenting Chinese texts becomes an essential task. During this task, the problem of unknown word occurs. It is impossible to register all words in a dictionary as new words can always be created by combining characters. We propose a unified solution to detect unknown words in Chinese texts. First, a morphological analysis is done to obtain initial segmentation and POS tags and then a chunker is used to detect unknown words.