A maximum entropy Chinese character-based parser

Authors:
Xiaoqiang Luo
Affiliations:
IBM T.J. Watson Research Center, Yorktown Heights, NY
Venue:
EMNLP '03 Proceedings of the 2003 conference on Empirical methods in natural language processing
Year:
2003

Citing 12
Cited 17

Chinese text segmentation for text retrieval: achievements and problems

Journal of the American Society for Information Science
A maximum entropy approach to natural language processing

Computational Linguistics
A stochastic finite-state word-segmentation algorithm for Chinese

Computational Linguistics
Self-Supervised Chinese Word Segmentation

IDA '01 Proceedings of the 4th International Conference on Advances in Intelligent Data Analysis
Mostly-unsupervised statistical segmentation of Japanese Kanji sequences

Natural Language Engineering
Improving Chinese tokenization with linguistic filters on statistical lexical acquisition

ANLC '94 Proceedings of the fourth conference on Applied natural language processing
Three generative, lexicalised models for statistical parsing

ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
A trainable rule-based algorithm for word segmentation

ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
An iterative algorithm to build Chinese language models

ACL '96 Proceedings of the 34th annual meeting on Association for Computational Linguistics
Statistical parsing with an automatically-extracted tree adjoining grammar

ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
Two statistical parsing models applied to the Chinese Treebank

CLPW '00 Proceedings of the second workshop on Chinese language processing: held in conjunction with the 38th Annual Meeting of the Association for Computational Linguistics - Volume 12
A statistical parser for Chinese

HLT '02 Proceedings of the second international conference on Human Language Technology Research

A maximum-entropy chinese parser augmented by transformation-based learning

ACM Transactions on Asian Language Information Processing (TALIP)
The Penn Chinese TreeBank: Phrase structure annotation of a large corpus

Natural Language Engineering
A fast, accurate deterministic parser for Chinese

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Semantic role labeling of nominalized predicates in Chinese

HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
Labeling chinese predicates with semantic roles

Computational Linguistics
A Joint Segmenting and Labeling Approach for Chinese Lexical Analysis

ECML PKDD '08 Proceedings of the European conference on Machine Learning and Knowledge Discovery in Databases - Part II
Combining Language Modeling and Discriminative Classification for Word Segmentation

CICLing '09 Proceedings of the 10th International Conference on Computational Linguistics and Intelligent Text Processing
Training conditional random fields using incomplete annotations

COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
TBL-improved non-deterministic segmentation and POS tagging for a Chinese parser

EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
Character-level dependencies in Chinese: usefulness and learning

EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
Generalizing local and non-local word-reordering patterns for syntax-based machine translation

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
A dual-layer CRFs based joint decoding method for cascaded segmentation and labeling tasks

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Automatic semantic role labeling for Chinese verbs

IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence
A deterministic method to predict phrase boundaries of a syntactic tree

ICIC'10 Proceedings of the Advanced intelligent computing theories and applications, and 6th international conference on Intelligent computing
Parsing the internal structure of words: a new paradigm for Chinese word segmentation

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
A lexicon-constrained character model for chinese morphological analysis

IJCNLP'05 Proceedings of the Second international joint conference on Natural Language Processing
Incremental joint approach to word segmentation, POS tagging, and dependency parsing in Chinese

ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1

Quantified Score

Hi-index	0.00

Visualization

Abstract

The paper presents a maximum entropy Chinese character-based parser trained on the Chinese Treebank ("CTB" henceforth). Word-based parse trees in CTB are first converted into character-based trees, where word-level part-of-speech (POS) tags become constituent labels and character-level tags are derived from word-level POS tags. A maximum entropy parser is then trained on the character-based corpus. The parser does word-segmentation, POS-tagging and parsing in a unified framework. An average label F-measure 81.4% and word-segmentation F-measure 96.0% are achieved by the parser. Our results show that word-level POS tags can improve significantly word-segmentation, but higher-level syntactic strutures are of little use to word segmentation in the maximum entropy parser. A word-dictionary helps to improve both word-segmentation and parsing accuracy.