How much can part-of-speech tagging help parsing?

Authors:
Mary Dalrymple
Affiliations:
Centre for Linguistics and Philology, University of Oxford, Oxford OX1 2HG UK e-mail: mary.dalrymple@ling-phil.ox.ac.uk
Venue:
Natural Language Engineering
Year:
2006

Citing 12
Cited 1

Taggers for parsers

Artificial Intelligence - Special volume on empirical methods
Information Retrieval

Information Retrieval
Building a large annotated corpus of English: the penn treebank

Computational Linguistics - Special issue on using large corpora: II
The interface between phrasal and functional constraints

Computational Linguistics
Supertagging: an approach to almost parsing

Computational Linguistics
The LinGO Redwoods treebank motivation and preliminary applications

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 2
Parsing the wall street journal using a Lexical-Functional Grammar and discriminative estimation techniques

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Feature-rich part-of-speech tagging with a cyclic dependency network

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
The Penn Treebank: annotating predicate argument structure

HLT '94 Proceedings of the workshop on Human Language Technology
The Parallel Grammar project

COLING-GEE '02 Proceedings of the 2002 workshop on Grammar engineering and evaluation - Volume 15
The importance of supertagging for wide-coverage CCG parsing

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Does tagging help parsing?: a case study on finite state parsing

FSMNLP '09 Proceedings of the International Workshop on Finite State Methods in Natural Language Processing

Unsupervised parse selection for HPSG

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Folk wisdom holds that incorporating a part-of-speech tagger into a system that performs deep linguistic analysis will improve the speed and accuracy of the system. Previous studies of tagging have tested this belief by incorporating an existing tagger into a parsing system and observing the effect on the speed of the parser and accuracy of the results. However, not much work has been done to determine in a fine-grained manner exactly how much tagging can help to disambiguate or reduce ambiguity in parser output. We take a new approach to this issue by examining the full parse-forest output of a large-scale LFG-based English grammar (Riezler et al. (2002)) running on the XLE grammar development platform (Maxwell and Kaplan (1993); Maxwell and Kaplan (1996)); and partitioning the parse outputs into equivalence classes based on the tag sequences for each parse. If we find a large number of tag-sequence equivalence classes for each sentence, we can conclude that different parses tend to be distinguished by their tags; a small number means that tagging would not help much in reducing ambiguity. In this way, we can determine how much tagging would help us in the best case, if we had the “perfect tagger” to give us the correct tag sequence for each sentence. We show that if a perfect tagger were available, a reduction in ambiguity of about 50% would be available. Somewhat surprisingly, about 30% of the sentences in the corpus that was examined would not be disambiguated, even by the perfect tagger, since all of the parses for these sentences shared the same tag sequence. Our study also helps to inform research on tagging by providing a targeted determination of exactly which tags can help the most in disambiguation.