Chinese text segmentation for text retrieval: achievements and problems
Journal of the American Society for Information Science
Inducing Features of Random Fields
IEEE Transactions on Pattern Analysis and Machine Intelligence
Foundations of statistical natural language processing
Foundations of statistical natural language processing
Machine Learning
Pattern Classification (2nd Edition)
Pattern Classification (2nd Edition)
Distribution of content words and phrases in text and language modelling
Natural Language Engineering
A trainable rule-based algorithm for word segmentation
ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
Minimum error rate training in statistical machine translation
ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Improved source-channel models for Chinese word segmentation
ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Distribution-based pruning of backoff language models
ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
Statistically-enhanced new word identification in a rule-based Chinese system
CLPW '00 Proceedings of the second workshop on Chinese language processing: held in conjunction with the 38th Annual Meeting of the Association for Computational Linguistics - Volume 12
The first international Chinese word segmentation Bakeoff
SIGHAN '03 Proceedings of the second SIGHAN workshop on Chinese language processing - Volume 17
The use of SVM for chinese new word identification
IJCNLP'04 Proceedings of the First international joint conference on Natural Language Processing
Chinese Word Segmentation and Named Entity Recognition: A Pragmatic Approach
Computational Linguistics
Subword-based tagging for confidence-dependent Chinese word segmentation
COLING-ACL '06 Proceedings of the COLING/ACL on Main conference poster sessions
Chinese word segmentation and statistical machine translation
ACM Transactions on Speech and Language Processing (TSLP)
ACL '07 Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions
TBL-improved non-deterministic segmentation and POS tagging for a Chinese parser
EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
A dual-layer CRFs based joint decoding method for cascaded segmentation and labeling tasks
IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Improved statistical machine translation by multiple Chinese word segmentation
StatMT '08 Proceedings of the Third Workshop on Statistical Machine Translation
ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1
Expert Systems with Applications: An International Journal
A Unified Character-Based Tagging Framework for Chinese Word Segmentation
ACM Transactions on Asian Language Information Processing (TALIP)
ICCOMP'06 Proceedings of the 10th WSEAS international conference on Computers
Parsing the internal structure of words: a new paradigm for Chinese word segmentation
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Feature selection strategies for automated classification of digital media content
Journal of Information Science
A lexicon-constrained character model for chinese morphological analysis
IJCNLP'05 Proceedings of the Second international joint conference on Natural Language Processing
Boosting-based ensemble learning with penalty profiles for automatic Thai unknown word recognition
Computers & Mathematics with Applications
Iterative annotation transformation with predict-self reestimation for Chinese word segmentation
EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Hi-index | 0.00 |
This paper presents a Chinese word segmentation system which can adapt to different domains and standards. We first present a statistical framework where domain-specific words are identified in a unified approach to word segmentation based on linear models. We explore several features and describe how to create training data by sampling. We then describe a transformation-based learning method used to adapt our system to different word segmentation standards. Evaluation of the proposed system on five test sets with different standards shows that the system achieves state- of-the-art performance on all of them.