An analysis of tree topological features in classifier-based unlexicalized parsing

Authors:
Samuel W. K. Chan;Mickey W. C. Chong;Lawrence Y. L. Cheung
Affiliations:
Dept. of Decision Sciences, Chinese University of Hong Kong, Shatin, Hong Kong SAR;Dept. of Decision Sciences, Chinese University of Hong Kong, Shatin, Hong Kong SAR;Dept. of Linguistics & Modern Languages, Chinese University of Hong Kong, Shatin, Hong Kong SAR
Venue:
CICLing'11 Proceedings of the 12th international conference on Computational linguistics and intelligent text processing - Volume Part I
Year:
2011

Citing 22
Cited 0

Algorithm schemata and data structures in syntactic processing

Readings in natural language processing
A decision-theoretic generalization of on-line learning and an application to boosting

Journal of Computer and System Sciences - Special issue: 26th annual ACM symposium on the theory of computing & STOC'94, May 23–25, 1994, and second annual Europe an conference on computational learning theory (EuroCOLT'95), March 13–15, 1995
Learning to Parse Natural Language with Maximum Entropy Models

Machine Learning - Special issue on natural language learning
BoosTexter: A Boosting-based Systemfor Text Categorization

Machine Learning - Special issue on information retrieval
An efficient context-free parsing algorithm

Communications of the ACM
Discriminative Reranking for Natural Language Parsing

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Ensemble Methods in Machine Learning

MCS '00 Proceedings of the First International Workshop on Multiple Classifier Systems
Head-driven statistical models for natural language parsing

Head-driven statistical models for natural language parsing
Building a large annotated corpus of English: the penn treebank

Computational Linguistics - Special issue on using large corpora: II
A maximum-entropy-inspired parser

NAACL 2000 Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference
Statistical decision-tree models for parsing

ACL '95 Proceedings of the 33rd annual meeting on Association for Computational Linguistics
Accurate unlexicalized parsing

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Towards history-based grammars: using richer models for probabilistic parsing

HLT '91 Proceedings of the workshop on Speech and Natural Language
Head-Driven Statistical Models for Natural Language Parsing

Computational Linguistics
Probabilistic CFG with latent annotations

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Coarse-to-fine n-best parsing and MaxEnt discriminative reranking

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Dependency Parsing

Dependency Parsing
Learning and inference for hierarchically split PCFGs

AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 2
A classifier-based parser with linear run-time complexity

Parsing '05 Proceedings of the Ninth International Workshop on Parsing Technology
Chunk parsing revisited

Parsing '05 Proceedings of the Ninth International Workshop on Parsing Technology
Tree topological features for unlexicalized parsing

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Parsing the penn chinese treebank with semantic knowledge

IJCNLP'05 Proceedings of the Second international joint conference on Natural Language Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

A novel set of "tree topological features" (TTFs) is investigated for improving a classifier-based unlexicalized parser. The features capture the location and shape of subtrees in the treebank. Four main categories of TTFs are proposed and compared. Experimental results showed that each of the four categories independently improved the parsing accuracy significantly over the baseline model. When combined using the ensemble technique, the best unlexicalized parser achieves 84% accuracy without any extra language resources, and matches the performance of early lexicalized parsers. Linguistically, TTFs approximate linguistic notions such as grammatical weight, branching property and structural parallelism. This is illustrated by studying how the features capture structural parallelism in processing coordinate structures.