Chinese text segmentation for text retrieval: achievements and problems
Journal of the American Society for Information Science
A maximum entropy approach to natural language processing
Computational Linguistics
A stochastic finite-state word-segmentation algorithm for Chinese
Computational Linguistics
Self-Supervised Chinese Word Segmentation
IDA '01 Proceedings of the 4th International Conference on Advances in Intelligent Data Analysis
Mostly-unsupervised statistical segmentation of Japanese Kanji sequences
Natural Language Engineering
Improving Chinese tokenization with linguistic filters on statistical lexical acquisition
ANLC '94 Proceedings of the fourth conference on Applied natural language processing
Three generative, lexicalised models for statistical parsing
ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
A trainable rule-based algorithm for word segmentation
ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
An iterative algorithm to build Chinese language models
ACL '96 Proceedings of the 34th annual meeting on Association for Computational Linguistics
Statistical parsing with an automatically-extracted tree adjoining grammar
ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
Two statistical parsing models applied to the Chinese Treebank
CLPW '00 Proceedings of the second workshop on Chinese language processing: held in conjunction with the 38th Annual Meeting of the Association for Computational Linguistics - Volume 12
A statistical parser for Chinese
HLT '02 Proceedings of the second international conference on Human Language Technology Research
A maximum-entropy chinese parser augmented by transformation-based learning
ACM Transactions on Asian Language Information Processing (TALIP)
The Penn Chinese TreeBank: Phrase structure annotation of a large corpus
Natural Language Engineering
A fast, accurate deterministic parser for Chinese
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Semantic role labeling of nominalized predicates in Chinese
HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
Labeling chinese predicates with semantic roles
Computational Linguistics
A Joint Segmenting and Labeling Approach for Chinese Lexical Analysis
ECML PKDD '08 Proceedings of the European conference on Machine Learning and Knowledge Discovery in Databases - Part II
Combining Language Modeling and Discriminative Classification for Word Segmentation
CICLing '09 Proceedings of the 10th International Conference on Computational Linguistics and Intelligent Text Processing
Training conditional random fields using incomplete annotations
COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
TBL-improved non-deterministic segmentation and POS tagging for a Chinese parser
EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
Character-level dependencies in Chinese: usefulness and learning
EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
Generalizing local and non-local word-reordering patterns for syntax-based machine translation
EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
A dual-layer CRFs based joint decoding method for cascaded segmentation and labeling tasks
IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Automatic semantic role labeling for Chinese verbs
IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence
A deterministic method to predict phrase boundaries of a syntactic tree
ICIC'10 Proceedings of the Advanced intelligent computing theories and applications, and 6th international conference on Intelligent computing
Parsing the internal structure of words: a new paradigm for Chinese word segmentation
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
A lexicon-constrained character model for chinese morphological analysis
IJCNLP'05 Proceedings of the Second international joint conference on Natural Language Processing
Incremental joint approach to word segmentation, POS tagging, and dependency parsing in Chinese
ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
Hi-index | 0.00 |
The paper presents a maximum entropy Chinese character-based parser trained on the Chinese Treebank ("CTB" henceforth). Word-based parse trees in CTB are first converted into character-based trees, where word-level part-of-speech (POS) tags become constituent labels and character-level tags are derived from word-level POS tags. A maximum entropy parser is then trained on the character-based corpus. The parser does word-segmentation, POS-tagging and parsing in a unified framework. An average label F-measure 81.4% and word-segmentation F-measure 96.0% are achieved by the parser. Our results show that word-level POS tags can improve significantly word-segmentation, but higher-level syntactic strutures are of little use to word segmentation in the maximum entropy parser. A word-dictionary helps to improve both word-segmentation and parsing accuracy.