The vocabulary problem in human-system communication
Communications of the ACM
Indexing and access for digital libraries and the Internet: human, database, and domain factors
Journal of the American Society for Information Science
Phrasier: a system for interactive document retrieval using keyphrases
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
KEA: practical automatic keyphrase extraction
Proceedings of the fourth ACM conference on Digital libraries
Towards multidocument summarization by reformulation: progress and prospects
AAAI '99/IAAI '99 Proceedings of the sixteenth national conference on Artificial intelligence and the eleventh Innovative applications of artificial intelligence conference innovative applications of artificial intelligence
Journal of the American Society for Information Science - Special issue on the 50th anniversary of the Journal of The American Society for Information Science: part 2: paradigms, models and methods of information science
Structural ambiguity and lexical relations
Computational Linguistics - Special issue on using large corpora: I
Evaluation of automatically identified index terms for browsing electronic documents
ANLC '00 Proceedings of the sixth conference on Applied natural language processing
A stochastic parts program and noun phrase parser for unrestricted text
ANLC '88 Proceedings of the second conference on Applied natural language processing
Termight: identifying and translating technical terminology
ANLC '94 Proceedings of the fourth conference on Applied natural language processing
Assessing term effectiveness in the interactive information access process
Information Processing and Management: an International Journal
TextGraphs-4 Proceedings of the 2009 Workshop on Graph-based Methods for Natural Language Processing
Hi-index | 0.00 |
We propose a gold standard for evaluating two types of information extraction output -- noun phrase (NP) chunks (Abney 1991; Ramshaw and Marcus 1995) and technical terms (Justeson and Katz 1995; Daille 2000; Jacquemin 2002). The gold standard is built around the notion that since different semantic and syntactic variants of terms are arguably correct, a fully satisfactory assessment of the quality of the output must include task-based evaluation. We conducted an experiment that assessed subjects' choice of index terms in an information access task. Subjects showed significant preference for index terms that are longer, as measured by number of words, and more complex, as measured by number of prepositions. These terms, which were identified by a human indexer, serve as the gold standard. The experimental protocol is a reliable and rigorous method for evaluating the quality of a set of terms. An important advantage of this task-based evaluation is that a set of index terms which is different than the gold standard can 'win' by providing better information access than the gold standard itself does. And although the individual human subject experiments are time consuming, the experimental interface, test materials and data analysis programs are completely re-usable.