Ambiguous part-of-speech tagging for improving accuracy and domain portability of syntactic parsers

Authors:
Kazuhiro Yoshida;Yoshimasa Tsuruoka;Yusuke Miyao;Jun'ichi Tsujii
Affiliations:
Department of Computer Science, University of Tokyo;School of Informatics, University of Manchester;Department of Computer Science, University of Tokyo;Department of Computer Science, University of Tokyo and School of Informatics, University of Manchester and National Center for Text Mining
Venue:
IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Year:
2007

Citing 12
Cited 4

Empirical methods for artificial intelligence

Empirical methods for artificial intelligence
Taggers for parsers

Artificial Intelligence - Special volume on empirical methods
A Maximum-Entropy-Inspired Parser

A Maximum-Entropy-Inspired Parser
Building a large annotated corpus of English: the penn treebank

Computational Linguistics - Special issue on using large corpora: II
Coarse-to-fine n-best parsing and MaxEnt discriminative reranking

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
The importance of supertagging for wide-coverage CCG parsing

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Bidirectional inference with the easiest-first strategy for tagging sequence data

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
The GENIA corpus: an annotated research abstract corpus in molecular biology domain

HLT '02 Proceedings of the second international conference on Human Language Technology Research
Efficacy of beam thresholding, unification filtering and hybrid parsing in probabilistic HPSG parsing

Parsing '05 Proceedings of the Ninth International Workshop on Parsing Technology
Corpus-Oriented grammar development for acquiring a head-driven phrase structure grammar from the penn treebank

IJCNLP'04 Proceedings of the First international joint conference on Natural Language Processing
Parsing biomedical literature

IJCNLP'05 Proceedings of the Second international joint conference on Natural Language Processing
Adapting a probabilistic disambiguation model of an HPSG parser to a new domain

IJCNLP'05 Proceedings of the Second international joint conference on Natural Language Processing

Comparative parser performance analysis across grammar frameworks through automatic tree conversion using synchronous grammars

COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
Inductive probabilistic taxonomy learning using singular value decomposition

Natural Language Engineering
Machine learning for high-quality tokenization replicating variable tokenization schemes

CICLing'13 Proceedings of the 14th international conference on Computational Linguistics and Intelligent Text Processing - Volume Part I
Automatic extraction of function-behaviour-state information from patents

Advanced Engineering Informatics

Quantified Score

Hi-index	0.00

Visualization

Abstract

We aim to improve the performance of a syntactic parser that uses a part-of-speech (POS) tagger as a preprocessor. Pipelined parsers consisting of POS taggers and syntactic parsers have several advantages, such as the capability of domain adaptation. However the performance of such systems on raw texts tends to be disappointing as they are affected by the errors of automatic POS tagging. We attempt to compensate for the decrease in accuracy caused by automatic taggers by allowing the taggers to output multiple answers when the tags cannot be determined reliably enough. We empirically verify the effectiveness of the method using an HPSG parser trained on the Penn Treebank. Our results show that ambiguous POS tagging improves parsing if outputs of taggers are weighted by probability values, and the results support previous studies with similar intentions. We also examine the effectiveness of our method for adapting the parser to the GENIA corpus and show that the use of ambiguous POS taggers can help development of portable parsers while keeping accuracy high.