Evaluating the effects of treebank size in a practical application for parsing

Authors:
Kenji Sagae;Yusuke Miyao;Rune Sætre;Jun'ichi Tsujii
Affiliations:
Univerisity of Tokyo, Japan;Univerisity of Tokyo, Japan;Univerisity of Tokyo, Japan;Univerisity of Tokyo, Japan and University of Manchester and National Center for Text Mining, Manchester, UK
Venue:
SETQA-NLP '08 Software Engineering, Testing, and Quality Assurance for Natural Language Processing
Year:
2008

Citing 8
Cited 0

A maximum entropy approach to natural language processing

Computational Linguistics
A best-first probabilistic shift-reduce parser

COLING-ACL '06 Proceedings of the COLING/ACL on Main conference poster sessions
On the unification of syntactic annotations under the stanford dependency scheme: a case study on BioInfer and GENIA

BioNLP '07 Proceedings of the Workshop on BioNLP 2007: Biological, Translational, and Clinical Language Processing
Labeled pseudo-projective dependency parsing with support vector machines

CoNLL-X '06 Proceedings of the Tenth Conference on Computational Natural Language Learning
The impact of parse quality on syntactically-informed statistical machine translation

EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
Evaluating impact of re-training a lexical disambiguation model on domain adaptation of an HPSG parser

IWPT '07 Proceedings of the 10th International Conference on Parsing Technologies
Learning relations from biomedical corpora using dependency trees

KDECB'06 Proceedings of the 1st international conference on Knowledge discovery and emergent complexity in bioinformatics
Parsing biomedical literature

IJCNLP'05 Proceedings of the Second international joint conference on Natural Language Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Natural language processing modules such as part-of-speech taggers, named-entity recognizers and syntactic parsers are commonly evaluated in isolation, under the assumption that artificial evaluation metrics for individual parts are predictive of practical performance of more complex language technology systems that perform practical tasks. Although this is an important issue in the design and engineering of systems that use natural language input, it is often unclear how the accuracy of an end-user application is affected by parameters that affect individual NLP modules. We explore this issue in the context of a specific task by examining the relationship between the accuracy of a syntactic parser and the overall performance of an information extraction system for biomedical text that includes the parser as one of its components. We present an empirical investigation of the relationship between factors that affect the accuracy of syntactic analysis, and how the difference in parse accuracy affects the overall system.