A stochastic finite-state word-segmentation algorithm for Chinese
Computational Linguistics
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
A trainable rule-based algorithm for word segmentation
ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
Chinese word segmentation without using lexicon and hand-crafted training data
COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 2
Covering ambiguity resolution in Chinese word segmentation based on contextual information
COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Shallow parsing with conditional random fields
NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Chinese Word Segmentation and Named Entity Recognition: A Pragmatic Approach
Computational Linguistics
Statistically-enhanced new word identification in a rule-based Chinese system
CLPW '00 Proceedings of the second workshop on Chinese language processing: held in conjunction with the 38th Annual Meeting of the Association for Computational Linguistics - Volume 12
A comparison of algorithms for maximum entropy parameter estimation
COLING-02 proceedings of the 6th conference on Natural language learning - Volume 20
Unsupervised training for overlapping ambiguity resolution in Chinese word segmentation
SIGHAN '03 Proceedings of the second SIGHAN workshop on Chinese language processing - Volume 17
Chinese lexical analysis using hierarchical hidden Markov model
SIGHAN '03 Proceedings of the second SIGHAN workshop on Chinese language processing - Volume 17
The first international Chinese word segmentation Bakeoff
SIGHAN '03 Proceedings of the second SIGHAN workshop on Chinese language processing - Volume 17
Combining segmenter and chunker for Chinese word segmentation
SIGHAN '03 Proceedings of the second SIGHAN workshop on Chinese language processing - Volume 17
Chinese word segmentation using minimal linguistic knowledge
SIGHAN '03 Proceedings of the second SIGHAN workshop on Chinese language processing - Volume 17
Chinese word segmentation as LMR tagging
SIGHAN '03 Proceedings of the second SIGHAN workshop on Chinese language processing - Volume 17
Adaptive Chinese word segmentation
ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Scaling conditional random fields using error-correcting codes
ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Chinese segmentation and new word detection using conditional random fields
COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Subword-based tagging by conditional random fields for Chinese word segmentation
NAACL-Short '06 Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers
A chunking strategy towards unknown word detection in chinese word segmentation
IJCNLP'05 Proceedings of the Second international joint conference on Natural Language Processing
Integrating Generative and Discriminative Character-Based Models for Chinese Word Segmentation
ACM Transactions on Asian Language Information Processing (TALIP)
ROCLING '11 Proceedings of the 23rd Conference on Computational Linguistics and Speech Processing
ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
ACM Transactions on Asian Language Information Processing (TALIP)
Unknown Chinese word extraction based on variety of overlapping strings
Information Processing and Management: an International Journal
Probabilistic Chinese word segmentation with non-local information and stochastic training
Information Processing and Management: an International Journal
Design and analysis of genetic algorithm based Chinese keyword extracting
International Journal of Computer Applications in Technology
Hi-index | 0.00 |
Chinese word segmentation is an active area in Chinese language processing though it is suffering from the argument about what precisely is a word in Chinese. Based on corpus-based segmentation standard, we launched this study. In detail, we regard Chinese word segmentation as a character-based tagging problem. We show that there has been a potent trend of using a character-based tagging approach in this field. In particular, learning from segmented corpus with or without additional linguistic resources is treated in a unified way in which the only difference depends on how the feature template set is selected. It differs from existing work in that both feature template selection and tag set selection are considered in our approach, instead of the previous feature template focus only technique. We show that there is a significant performance difference as different tag sets are selected. This is especially applied to a six-tag set, which is good enough for most current segmented corpora. The linguistic meaning of a tag set is also discussed. Our results show that a simple learning system with six n-gram feature templates and a six-tag set can obtain competitive performance in the cases of learning only from a training corpus. In cases when additional linguistic resources are available, an ensemble learning technique, assistant segmenter, is proposed and its effectiveness is verified. Assistant segmenter is also proven to be an effective method as segmentation standard adaptation that outperforms existing ones. Based on the proposed approach, our system provides state-of-the-art performance in all 12 corpora of three international Chinese word segmentation bakeoffs.