A model for quantitative evaluation of an end-to-end question-answering system

Authors:
Nina Wacholder;Diane Kelly;Paul Kantor;Robert Rittman;Ying Sun;Bing Bai;Sharon Small;Boris Yamrom;Tomek Strzalkowski
Affiliations:
School of Communication, Information and Library Studies, Rutgers University, New Brunswick, NJ 08901;School of Information and Library Science, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599;School of Communication, Information and Library Studies, Rutgers University, New Brunswick, NJ 08901;School of Communication, Information and Library Studies, Rutgers University, New Brunswick, NJ 08901;School of Communication, Information and Library Studies, Rutgers University, New Brunswick, NJ 08901;School of Communication, Information and Library Studies, Rutgers University, New Brunswick, NJ 08901;Department of Computer Science, University at Albany, Albany, NY 12222;Department of Computer Science, University at Albany, Albany, NY 12222;Department of Computer Science, University at Albany, Albany, NY 12222
Venue:
Journal of the American Society for Information Science and Technology
Year:
2007

Citing 25
Cited 11

The significance of the Cranfield tests on index languages

SIGIR '91 Proceedings of the 14th annual international ACM SIGIR conference on Research and development in information retrieval
Wizard of Oz studies: why and how

IUI '93 Proceedings of the 1st international conference on Intelligent user interfaces
Multiple search sessions model of end-user behavior: an exploratory study

Journal of the American Society for Information Science
Some perspectives on the evaluation of information retrieval systems

Journal of the American Society for Information Science - Special issue: evaluation of information retrieval systems
Variations in relevance assessments and the measurement of retrieval effectiveness

Journal of the American Society for Information Science - Special issue: evaluation of information retrieval systems
Evaluating interactive systems in TREC

Journal of the American Society for Information Science - Special issue: evaluation of information retrieval systems
The Cranfield tests on index language devices

Readings in information retrieval
Relevance: a review of and a framework for the thinking on the notion in information science

Readings in information retrieval
Ant World (demonstration abstract)

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Real life, real users, and real needs: a study and analysis of user queries on the web

Information Processing and Management: an International Journal
Do batch and user evaluations give the same results?

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Building a question answering test collection

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
The TREC interactive track: an annotated bibliography

Information Processing and Management: an International Journal - Special issue on interactivity at the text retrieval conference (TREC)
Why batch and user evaluations do not give the same results

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Modern Information Retrieval

Modern Information Retrieval
The Philosophy of Information Retrieval Evaluation

CLEF '01 Revised Papers from the Second Workshop of the Cross-Language Evaluation Forum on Evaluation of Cross-Language Information Retrieval Systems
The concept of relevance in IR

Journal of the American Society for Information Science and Technology
Automatic language and information processing: rethinking evaluation

Natural Language Engineering
Deep Read: a reading comprehension system

ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Evaluating the evaluation: a case study using the TREC 2002 question answering track

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
HITIQA: an interactive question answering system a preliminary report

MultiSumQA '03 Proceedings of the ACL 2003 workshop on Multilingual summarization and question answering - Volume 12
Cross-Evaluation: A new model for information system evaluation

Journal of the American Society for Information Science and Technology
HITIQA: towards analytical question answering

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Using interview data to identify evaluation criteria for interactive, analytical question-answering systems

Journal of the American Society for Information Science and Technology
Automated essay scoring for nonnative English speakers

ASSESSEVALNLP '99 Proceedings of a Symposium on Computer Mediated Language Assessment and Evaluation in Natural Language Processing

Using interview data to identify evaluation criteria for interactive, analytical question-answering systems

Journal of the American Society for Information Science and Technology
Assessing term effectiveness in the interactive information access process

Information Processing and Management: an International Journal
A predictive framework for retrieving the best answer

Proceedings of the 2008 ACM symposium on Applied computing
User adaptation: good results from poor systems

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Toward automatic facet analysis and need negotiation: Lessons from mediated search

ACM Transactions on Information Systems (TOIS)
Testing a decision-theoretic approach to the evaluation of information retrieval systems

Journal of Information Science
Questionnaires for eliciting evaluation data from users of interactive question answering systems

Natural Language Engineering
Towards a hierarchical framework for predicting the best answer in a question answering system

ICADL'07 Proceedings of the 10th international conference on Asian digital libraries: looking back 10 years and forging new frontiers
Reconsideration of the simulated work task situation: a context instrument for evaluation of information retrieval interaction

Proceedings of the third symposium on Information interaction in context
Applying web usage mining for adaptive intranet navigation

IRFC'11 Proceedings of the Second international conference on Multidisciplinary information retrieval facility
Evaluating books finding tools on social media: A case study of aNobii

Information Processing and Management: an International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

We describe a procedure for quantitative evaluation of interactive question-answering systems and illustrate it with application to the High-Quality Interactive Question-Answering (HITIQA) system. Our objectives were (a) to design a method to realistically and reliably assess interactive question-answering systems by comparing the quality of reports produced using different systems, (b) to conduct a pilot test of this method, and (c) to perform a formative evaluation of the HITIQA system. Far more important than the specific information gathered from this pilot evaluation is the development of (a) a protocol for evaluating an emerging technology, (b) reusable assessment instruments, and (c) the knowledge gained in conducting the evaluation. We conclude that this method, which uses a surprisingly small number of subjects and does not rely on predetermined relevance judgments, measures the impact of system change on work produced by users. Therefore this method can be used to compare the product of interactive systems that use different underlying technologies. © 2007 Wiley Periodicals, Inc.