A model for quantitative evaluation of an end-to-end question-answering system

  • Authors:
  • Nina Wacholder;Diane Kelly;Paul Kantor;Robert Rittman;Ying Sun;Bing Bai;Sharon Small;Boris Yamrom;Tomek Strzalkowski

  • Affiliations:
  • School of Communication, Information and Library Studies, Rutgers University, New Brunswick, NJ 08901;School of Information and Library Science, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599;School of Communication, Information and Library Studies, Rutgers University, New Brunswick, NJ 08901;School of Communication, Information and Library Studies, Rutgers University, New Brunswick, NJ 08901;School of Communication, Information and Library Studies, Rutgers University, New Brunswick, NJ 08901;School of Communication, Information and Library Studies, Rutgers University, New Brunswick, NJ 08901;Department of Computer Science, University at Albany, Albany, NY 12222;Department of Computer Science, University at Albany, Albany, NY 12222;Department of Computer Science, University at Albany, Albany, NY 12222

  • Venue:
  • Journal of the American Society for Information Science and Technology
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

We describe a procedure for quantitative evaluation of interactive question-answering systems and illustrate it with application to the High-Quality Interactive Question-Answering (HITIQA) system. Our objectives were (a) to design a method to realistically and reliably assess interactive question-answering systems by comparing the quality of reports produced using different systems, (b) to conduct a pilot test of this method, and (c) to perform a formative evaluation of the HITIQA system. Far more important than the specific information gathered from this pilot evaluation is the development of (a) a protocol for evaluating an emerging technology, (b) reusable assessment instruments, and (c) the knowledge gained in conducting the evaluation. We conclude that this method, which uses a surprisingly small number of subjects and does not rely on predetermined relevance judgments, measures the impact of system change on work produced by users. Therefore this method can be used to compare the product of interactive systems that use different underlying technologies. © 2007 Wiley Periodicals, Inc.