Hybrid combination of constituency and dependency trees into an ensemble dependency parser

Authors:
Nathan David Green;Zdeněk Žabokrtský
Affiliations:
Charles University in Prague, Prague, Czech Republic;Charles University in Prague, Prague, Czech Republic
Venue:
HYBRID '12 Proceedings of the Workshop on Innovative Hybrid Approaches to the Processing of Textual Data
Year:
2012

Citing 16
Cited 1

Ensemble Methods in Machine Learning

MCS '00 Proceedings of the First International Workshop on Multiple Classifier Systems
Building a large annotated corpus of English: the penn treebank

Computational Linguistics - Special issue on using large corpora: II
Three new probabilistic models for dependency parsing: an exploration

COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 1
Accurate unlexicalized parsing

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Coarse-to-fine n-best parsing and MaxEnt discriminative reranking

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Non-projective dependency parsing using spanning tree algorithms

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Dependency Parsing

Dependency Parsing
CoNLL-X shared task on multilingual dependency parsing

CoNLL-X '06 Proceedings of the Tenth Conference on Computational Natural Language Learning
Parser combination by reparsing

NAACL-Short '06 Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers
TectoMT: highly modular MT system with tectogrammatics used as transfer layer

StatMT '08 Proceedings of the Third Workshop on Statistical Machine Translation
Improving parsing accuracy by combining diverse dependency parsers

Parsing '05 Proceedings of the Ninth International Workshop on Parsing Technology
Ensemble models for dependency parsing: cheap and good?

HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Syntactic processing using the generalized perceptron and beam search

Computational Linguistics
An ensemble model that combines syntactic and semantic clustering for discriminative dependency parsing

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
Influence of parser choice on dependency-based MT

WMT '11 Proceedings of the Sixth Workshop on Statistical Machine Translation
Data-Driven part-of-speech tagging of kiswahili

TSD'06 Proceedings of the 9th international conference on Text, Speech and Dialogue

Combine constituent and dependency parsing via reranking

IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

Dependency parsing has made many advancements in recent years, in particular for English. There are a few dependency parsers that achieve comparable accuracy scores with each other but with very different types of errors. This paper examines creating a new dependency structure through ensemble learning using a hybrid of the outputs of various parsers. We combine all tree outputs into a weighted edge graph, using 4 weighting mechanisms. The weighted edge graph is the input into our ensemble system and is a hybrid of very different parsing techniques (constituent parsers, transition-based dependency parsers, and a graph-based parser). From this graph we take a maximum spanning tree. We examine the new dependency structure in terms of accuracy and errors on individual part-of-speech values. The results indicate that using a greater number of more varied parsers will improve accuracy results. The combined ensemble system, using 5 parsers based on 3 different parsing techniques, achieves an accuracy score of 92.58%, beating all single parsers on the Wall Street Journal section 23 test set. Additionally, the ensemble system reduces the average relative error on selected POS tags by 9.82%.