Using graded-relevance metrics for evaluating community QA answer selection

Authors:
Tetsuya Sakai;Daisuke Ishikawa;Noriko Kando;Yohei Seki;Kazuko Kuriyama;Chin-Yew Lin
Affiliations:
Microsoft Research Asia, Beijing, China;NII, Tokyo, Japan;NII, Tokyo, Japan;University of Tsukuba, Ibaraki, Japan;Shirayuri College, Tokyo, Japan;MSRA, Beijing, China
Venue:
Proceedings of the fourth ACM international conference on Web search and data mining
Year:
2011

Citing 12
Cited 9

Cumulated gain-based evaluation of IR techniques

ACM Transactions on Information Systems (TOIS)
A framework to predict the quality of answers with non-textual features

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Evaluating evaluation metrics based on the bootstrap

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Will pyramids built of nuggets topple over?

HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
The Pyramid Method: Incorporating human content selection variation in summarization evaluation

ACM Transactions on Speech and Language Processing (TSLP)
Expertise networks in online communities: structure and algorithms

Proceedings of the 16th international conference on World Wide Web
Finding high-quality content in social media

WSDM '08 Proceedings of the 2008 International Conference on Web Search and Data Mining
Recommending questions using the mdl-based tree cut model

Proceedings of the 17th international conference on World Wide Web
Quality-aware collaborative question answering: methods and evaluation

Proceedings of the Second ACM International Conference on Web Search and Data Mining
Modeling information-seeker satisfaction in community question answering

ACM Transactions on Knowledge Discovery from Data (TKDD)
Ranking community answers by modeling question-answer relationships via analogical reasoning

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Understanding and summarizing answers in community-based question answering services

COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1

Competition-based user expertise score estimation

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Click the search button and be happy: evaluating direct and immediate information access

Proceedings of the 20th ACM international conference on Information and knowledge management
Mining slang and urban opinion words and phrases from cQA services: an optimization approach

Proceedings of the fifth ACM international conference on Web search and data mining
Analyzing and predicting question quality in community question answering services

Proceedings of the 21st international conference companion on World Wide Web
Vote calibration in community question-answering systems

SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Exploiting user feedback to learn to rank answers in q&a forums: a case study with stack overflow

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Estimating sharer reputation via social data calibration

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
Fit or unfit: analysis and prediction of 'closed questions' on stack overflow

Proceedings of the first ACM conference on Online social networks
Discovering high quality answers in community question answering archives using a hierarchy of classifiers

Information Sciences: an International Journal

Quantified Score

Hi-index	0.01

Visualization

Abstract

Community Question Answering (CQA) sites such as Yahoo! Answers have emerged as rich knowledge resources for information seekers. However, answers posted to CQA sites can be irrelevant, incomplete, redundant, incorrect, biased, ill-formed or even abusive. Hence, automatic selection of "good" answers for a given posted question is a practical research problem that will help us manage the quality of accumulated knowledge. One way to evaluate answer selection systems for CQA would be to use the Best Answers (BAs) that are readily available from the CQA sites. However, BAs may be biased, and even if they are not, there may be other good answers besides BAs. To remedy these two problems, we propose system evaluation methods that involve multiple answer assessors and graded-relevance information retrieval metrics. Our main findings from experiments using the NTCIR-8 CQA task data are that, using our evaluation methods, (a) we can detect many substantial differences between systems that would have been overlooked by BA-based evaluation; and (b) we can better identify hard questions (i.e. those that are handled poorly by many systems and therefore require focussed investigation) compared to BAbased evaluation. We therefore argue that our approach is useful for building effective CQA answer selection systems despite the cost of manual answer assessments.