Integrated NLP evaluation system for pluggable evaluation metrics with extensive interoperable toolkit

Authors:
Yoshinobu Kano;Luke McCrohon;Sophia Ananiadou;Jun'ichi Tsujii
Affiliations:
University of Tokyo, Bunkyo-ku, Tokyo;University of Tokyo, Bunkyo-ku, Tokyo;University of Manchester and National Centre for Text Mining, UK;University of Tokyo, Bunkyo-ku, Tokyo and University of Manchester and National Centre for Text Mining, UK
Venue:
SETQA-NLP '09 Proceedings of the Workshop on Software Engineering, Testing, and Quality Assurance for Natural Language Processing
Year:
2009

Citing 9
Cited 1

Building a large annotated corpus of English: the penn treebank

Computational Linguistics - Special issue on using large corpora: II
Building an example application with the unstructured information management architecture

IBM Systems Journal
ABNER: an open source tool for automatically tagging genes, proteins and other entity names in text

Bioinformatics
Feature forest models for probabilistic hpsg parsing

Computational Linguistics
Introduction to the bio-entity recognition task at JNLPBA

JNLPBA '04 Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications
U-Compare

Bioinformatics
Parallel entity and treebank annotation

CorpusAnno '05 Proceedings of the Workshop on Frontiers in Corpus Annotations II: Pie in the Sky
Comparative experiments on learning information extractors for proteins and their interactions

Artificial Intelligence in Medicine
Developing a robust part-of-speech tagger for biomedical text

PCI'05 Proceedings of the 10th Panhellenic conference on Advances in Informatics

U-compare: A modular NLP workflow construction and evaluation system

IBM Journal of Research and Development

Quantified Score

Hi-index	0.01

Visualization

Abstract

To understand the key characteristics of NLP tools, evaluation and comparison against different tools is important. And as NLP applications tend to consist of multiple semi-independent sub-components, it is not always enough to just evaluate complete systems, a fine grained evaluation of underlying components is also often worthwhile. Standardization of NLP components and resources is not only significant for reusability, but also in that it allows the comparison of individual components in terms of reliability and robustness in a wider range of target domains. But as many evaluation metrics exist in even a single domain, any system seeking to aid inter-domain evaluation needs not just predefined metrics, but must also support pluggable user-defined metrics. Such a system would of course need to be based on an open standard to allow a large number of components to be compared, and would ideally include visualization of the differences between components. We have developed a pluggable evaluation system based on the UIMA framework, which provides visualization useful in error analysis. It is a single integrated system which includes a large ready-to-use, fully interoperable library of NLP tools.