A fast, accurate deterministic parser for Chinese

Authors:
Mengqiu Wang;Kenji Sagae;Teruko Mitamura
Affiliations:
Carnegie Mellon University;Carnegie Mellon University;Carnegie Mellon University
Venue:
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Year:
2006

Citing 16
Cited 15

Natural language parsing as statistical pattern recognition

Natural language parsing as statistical pattern recognition
Learning to Parse Natural Language with Maximum Entropy Models

Machine Learning - Special issue on natural language learning
Unpacking Multi-valued Symbolic Features and Classes in Memory-Based Language Learning

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Exploiting diversity for natural language parsing

Exploiting diversity for natural language parsing
A maximum-entropy chinese parser augmented by transformation-based learning

ACM Transactions on Asian Language Information Processing (TALIP)
The Penn Chinese TreeBank: Phrase structure annotation of a large corpus

Natural Language Engineering
Recovering latent information in treebanks

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Is it harder to parse Chinese, or the Chinese Treebank?

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
On the parameter space of generative lexicalized statistical parsing models

On the parameter space of generative lexicalized statistical parsing models
Use of support vector learning for chunk identification

ConLL '00 Proceedings of the 2nd workshop on Learning language in logic and the 4th conference on Computational natural language learning - Volume 7
Two statistical parsing models applied to the Chinese Treebank

CLPW '00 Proceedings of the second workshop on Chinese language processing: held in conjunction with the 38th Annual Meeting of the Association for Computational Linguistics - Volume 12
The effect of rhythm on structural disambiguation in Chinese

SIGHAN '03 Proceedings of the second SIGHAN workshop on Chinese language processing - Volume 17
A maximum entropy Chinese character-based parser

EMNLP '03 Proceedings of the 2003 conference on Empirical methods in natural language processing
Deterministic dependency parsing of English text

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
A classifier-based parser with linear run-time complexity

Parsing '05 Proceedings of the Ninth International Workshop on Parsing Technology
Parsing the penn chinese treebank with semantic knowledge

IJCNLP'05 Proceedings of the Second international joint conference on Natural Language Processing

Probabilistic Models for Action-Based Chinese Dependency Parsing

ECML '07 Proceedings of the 18th European conference on Machine Learning
Using Short Dependency Relations from Auto-Parsed Data for Chinese Dependency Parsing

ACM Transactions on Asian Language Information Processing (TALIP)
A puristic approach for joint dependency parsing and semantic role labeling

CoNLL '08 Proceedings of the Twelfth Conference on Computational Natural Language Learning
Ungreedy methods for Chinese deterministic dependency parsing

AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 2
Accurate learning for Chinese function tags from minimal features

ACLstudent '09 Proceedings of the ACL-IJCNLP 2009 Student Research Workshop
Exploiting heterogeneous treebanks for parsing

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1
Transition-based parsing of the Chinese treebank using a global discriminative model

IWPT '09 Proceedings of the 11th International Conference on Parsing Technologies
Chinese novelty mining

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 3 - Volume 3
Probabilistic models for answer-ranking in multilingual question-answering

ACM Transactions on Information Systems (TOIS)
A deterministic method to predict phrase boundaries of a syntactic tree

ICIC'10 Proceedings of the Advanced intelligent computing theories and applications, and 6th international conference on Intelligent computing
Parsing the internal structure of words: a new paradigm for Chinese word segmentation

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Better automatic treebank conversion using a feature-based approach

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
Multilingual sentence categorization and novelty mining

Information Processing and Management: an International Journal
The challenges of parsing Chinese with combinatory categorial grammar

NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Capturing paradigmatic and syntagmatic lexical relations: towards accurate Chinese part-of-speech tagging

ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present a novel classifier-based deterministic parser for Chinese constituency parsing. Our parser computes parse trees from bottom up in one pass, and uses classifiers to make shift-reduce decisions. Trained and evaluated on the standard training and test sets, our best model (using stacked classifiers) runs in linear time and has labeled precision and recall above 88% using gold-standard part-of-speech tags, surpassing the best published results. Our SVM parser is 2-13 times faster than state-of-the-art parsers, while producing more accurate results. Our Maxent and DTree parsers run at speeds 40-270 times faster than state-of-the-art parsers, but with 5-6% losses in accuracy.