Grammar-driven versus data-driven: which parsing system is more affected by domain shifts?

Authors:
Barbara Plank;Gertjan van Noord
Affiliations:
University of Groningen, The Netherlands;University of Groningen, The Netherlands
Venue:
NLPLING '10 Proceedings of the 2010 Workshop on NLP and Linguistics: Finding the Common Ground
Year:
2010

Citing 12
Cited 3

More accurate tests for the statistical significance of result differences

COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 2
Error mining for wide-coverage grammar engineering

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Non-projective dependency parsing using spanning tree algorithms

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Effective self-training for parsing

HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
CoNLL-X shared task on multilingual dependency parsing

CoNLL-X '06 Proceedings of the Tenth Conference on Computational Natural Language Learning
Exploring an auxiliary distribution based approach to domain adaptation of a syntactic disambiguation model

CrossParser '08 Coling 2008: Proceedings of the workshop on Cross-Framework and Cross-Domain Parser Evaluation
Learning efficient parsing

EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
Domain adaptation with structural correspondence learning

EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
Using self-trained bilexical preferences to improve disambiguation accuracy

IWPT '07 Proceedings of the 10th International Conference on Parsing Technologies
Cross-domain dependency parsing using a deep linguistic grammar

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1
A generalized method for iterative error mining in parsing results

GEAF '09 Proceedings of the 2009 Workshop on Grammar Engineering Across Frameworks
Adapting a probabilistic disambiguation model of an HPSG parser to a new domain

IJCNLP'05 Proceedings of the Second international joint conference on Natural Language Processing

Effective measures of domain similarity for parsing

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Minimally supervised domain-adaptive parse reranking for relation extraction

IWPT '11 Proceedings of the 12th International Conference on Parsing Technologies
Detecting dependency parse errors with minimal resources

IWPT '11 Proceedings of the 12th International Conference on Parsing Technologies

Quantified Score

Hi-index	0.00

Visualization

Abstract

In the past decade several parsing systems for natural language have emerged, which use different methods and formalisms. For instance, systems that employ a handcrafted grammar and a statistical disambiguation component versus purely statistical data-driven systems. What they have in common is the lack of portability to new domains: their performance might decrease substantially as the distance between test and training domain increases. Yet, to which degree do they suffer from this problem, i.e. which kind of parsing system is more affected by domain shifts? Intuitively, grammar-driven systems should be less affected by domain changes. To investigate this hypothesis, an empirical investigation on Dutch is carried out. The performance variation of a grammar-driven versus two data-driven systems across domains is evaluated, and a simple measure to quantify domain sensitivity proposed. This will give an estimate of which parsing system is more affected by domain shifts, and thus more in need for adaptation techniques.