Chinese unknown word identification using character-based tagging and chunking
ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 2
Chinese lexical analysis using hierarchical hidden Markov model
SIGHAN '03 Proceedings of the second SIGHAN workshop on Chinese language processing - Volume 17
A two-stage statistical word segmentation system for Chinese
SIGHAN '03 Proceedings of the second SIGHAN workshop on Chinese language processing - Volume 17
HHMM-based Chinese lexical analyzer ICTCLAS
SIGHAN '03 Proceedings of the second SIGHAN workshop on Chinese language processing - Volume 17
Japanese unknown word identification by character-based chunking
COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Chinese word segmentation as morpheme-based lexical chunking
Information Sciences: an International Journal
Recognize person names from Chinese texts based on clustering SVM
ISC '07 Proceedings of the 10th IASTED International Conference on Intelligent Systems and Control
Incorporating user behaviors in new word detection
IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence
Fusion of multiple features for chinese named entity recognition based on CRF model
AIRS'08 Proceedings of the 4th Asia information retrieval conference on Information retrieval technology
User Behaviors in Related Word Retrieval and New Word Detection: A Collaborative Perspective
ACM Transactions on Asian Language Information Processing (TALIP)
High speed unknown word prediction using support vector machine for chinese text-to-speech systems
IJCNLP'04 Proceedings of the First international joint conference on Natural Language Processing
Chinese unknown word identification using class-based LM
IJCNLP'04 Proceedings of the First international joint conference on Natural Language Processing
A comparative study on representing units in chinese text clustering
KSEM'06 Proceedings of the First international conference on Knowledge Science, Engineering and Management
A collaborative multimedia editing system based on shallow nature language parsing
CDVE'06 Proceedings of the Third international conference on Cooperative Design, Visualization, and Engineering
Hi-index | 0.00 |
This paper presents a unified solution, which is based on the idea of "roles tagging", to the complicated problems of Chinese unknown words recognition. In our approach, an unknown word is identified according to its component tokens and context tokens. In order to capture the functions of tokens, we use the concept of roles. Roles are tagged through applying the Viterbi algorithm in the fashion of a POS tagger. In the resulted most probable roles sequence, all the eligible unknown words are recognized through a maximum patterns matching. We have got excellent precision and recalling rates, especially for person names and transliterations. The result and experiments in our system ICTCLAS shows that our approach based on roles tagging is simple yet effective.