Toward a task-based gold standard for evaluation of NP chunks and technical terms

Authors:
Nina Wacholder;Peng Song
Affiliations:
Rutgers University;Rutgers University
Venue:
NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Year:
2003

Citing 10
Cited 2

The vocabulary problem in human-system communication

Communications of the ACM
Indexing and access for digital libraries and the Internet: human, database, and domain factors

Journal of the American Society for Information Science
Phrasier: a system for interactive document retrieval using keyphrases

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
KEA: practical automatic keyphrase extraction

Proceedings of the fourth ACM conference on Digital libraries
Towards multidocument summarization by reformulation: progress and prospects

AAAI '99/IAAI '99 Proceedings of the sixteenth national conference on Artificial intelligence and the eleventh Innovative applications of artificial intelligence conference innovative applications of artificial intelligence
Information science

Journal of the American Society for Information Science - Special issue on the 50th anniversary of the Journal of The American Society for Information Science: part 2: paradigms, models and methods of information science
Structural ambiguity and lexical relations

Computational Linguistics - Special issue on using large corpora: I
Evaluation of automatically identified index terms for browsing electronic documents

ANLC '00 Proceedings of the sixth conference on Applied natural language processing
A stochastic parts program and noun phrase parser for unrestricted text

ANLC '88 Proceedings of the second conference on Applied natural language processing
Termight: identifying and translating technical terminology

ANLC '94 Proceedings of the fourth conference on Applied natural language processing

Assessing term effectiveness in the interactive information access process

Information Processing and Management: an International Journal
A cohesion graph based approach for unsupervised recognition of literal and non-literal use of multiword expressions

TextGraphs-4 Proceedings of the 2009 Workshop on Graph-based Methods for Natural Language Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

We propose a gold standard for evaluating two types of information extraction output -- noun phrase (NP) chunks (Abney 1991; Ramshaw and Marcus 1995) and technical terms (Justeson and Katz 1995; Daille 2000; Jacquemin 2002). The gold standard is built around the notion that since different semantic and syntactic variants of terms are arguably correct, a fully satisfactory assessment of the quality of the output must include task-based evaluation. We conducted an experiment that assessed subjects' choice of index terms in an information access task. Subjects showed significant preference for index terms that are longer, as measured by number of words, and more complex, as measured by number of prepositions. These terms, which were identified by a human indexer, serve as the gold standard. The experimental protocol is a reliable and rigorous method for evaluating the quality of a set of terms. An important advantage of this task-based evaluation is that a set of index terms which is different than the gold standard can 'win' by providing better information access than the gold standard itself does. And although the individual human subject experiments are time consuming, the experimental interface, test materials and data analysis programs are completely re-usable.