The significance of the Cranfield tests on index languages
SIGIR '91 Proceedings of the 14th annual international ACM SIGIR conference on Research and development in information retrieval
Wizard of Oz studies: why and how
IUI '93 Proceedings of the 1st international conference on Intelligent user interfaces
Multiple search sessions model of end-user behavior: an exploratory study
Journal of the American Society for Information Science
Some perspectives on the evaluation of information retrieval systems
Journal of the American Society for Information Science - Special issue: evaluation of information retrieval systems
Variations in relevance assessments and the measurement of retrieval effectiveness
Journal of the American Society for Information Science - Special issue: evaluation of information retrieval systems
Evaluating interactive systems in TREC
Journal of the American Society for Information Science - Special issue: evaluation of information retrieval systems
The Cranfield tests on index language devices
Readings in information retrieval
Relevance: a review of and a framework for the thinking on the notion in information science
Readings in information retrieval
Ant World (demonstration abstract)
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Real life, real users, and real needs: a study and analysis of user queries on the web
Information Processing and Management: an International Journal
Do batch and user evaluations give the same results?
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Building a question answering test collection
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
The TREC interactive track: an annotated bibliography
Information Processing and Management: an International Journal - Special issue on interactivity at the text retrieval conference (TREC)
Why batch and user evaluations do not give the same results
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Modern Information Retrieval
The Philosophy of Information Retrieval Evaluation
CLEF '01 Revised Papers from the Second Workshop of the Cross-Language Evaluation Forum on Evaluation of Cross-Language Information Retrieval Systems
The concept of relevance in IR
Journal of the American Society for Information Science and Technology
Automatic language and information processing: rethinking evaluation
Natural Language Engineering
Deep Read: a reading comprehension system
ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Evaluating the evaluation: a case study using the TREC 2002 question answering track
NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
HITIQA: an interactive question answering system a preliminary report
MultiSumQA '03 Proceedings of the ACL 2003 workshop on Multilingual summarization and question answering - Volume 12
Cross-Evaluation: A new model for information system evaluation
Journal of the American Society for Information Science and Technology
HITIQA: towards analytical question answering
COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Journal of the American Society for Information Science and Technology
Automated essay scoring for nonnative English speakers
ASSESSEVALNLP '99 Proceedings of a Symposium on Computer Mediated Language Assessment and Evaluation in Natural Language Processing
Journal of the American Society for Information Science and Technology
Assessing term effectiveness in the interactive information access process
Information Processing and Management: an International Journal
A predictive framework for retrieving the best answer
Proceedings of the 2008 ACM symposium on Applied computing
User adaptation: good results from poor systems
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Toward automatic facet analysis and need negotiation: Lessons from mediated search
ACM Transactions on Information Systems (TOIS)
Testing a decision-theoretic approach to the evaluation of information retrieval systems
Journal of Information Science
Questionnaires for eliciting evaluation data from users of interactive question answering systems
Natural Language Engineering
Towards a hierarchical framework for predicting the best answer in a question answering system
ICADL'07 Proceedings of the 10th international conference on Asian digital libraries: looking back 10 years and forging new frontiers
Proceedings of the third symposium on Information interaction in context
Applying web usage mining for adaptive intranet navigation
IRFC'11 Proceedings of the Second international conference on Multidisciplinary information retrieval facility
Evaluating books finding tools on social media: A case study of aNobii
Information Processing and Management: an International Journal
Hi-index | 0.00 |
We describe a procedure for quantitative evaluation of interactive question-answering systems and illustrate it with application to the High-Quality Interactive Question-Answering (HITIQA) system. Our objectives were (a) to design a method to realistically and reliably assess interactive question-answering systems by comparing the quality of reports produced using different systems, (b) to conduct a pilot test of this method, and (c) to perform a formative evaluation of the HITIQA system. Far more important than the specific information gathered from this pilot evaluation is the development of (a) a protocol for evaluating an emerging technology, (b) reusable assessment instruments, and (c) the knowledge gained in conducting the evaluation. We conclude that this method, which uses a surprisingly small number of subjects and does not rely on predetermined relevance judgments, measures the impact of system change on work produced by users. Therefore this method can be used to compare the product of interactive systems that use different underlying technologies. © 2007 Wiley Periodicals, Inc.