Cumulated gain-based evaluation of IR techniques
ACM Transactions on Information Systems (TOIS)
A framework to predict the quality of answers with non-textual features
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Evaluating evaluation metrics based on the bootstrap
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Will pyramids built of nuggets topple over?
HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
The Pyramid Method: Incorporating human content selection variation in summarization evaluation
ACM Transactions on Speech and Language Processing (TSLP)
Expertise networks in online communities: structure and algorithms
Proceedings of the 16th international conference on World Wide Web
Finding high-quality content in social media
WSDM '08 Proceedings of the 2008 International Conference on Web Search and Data Mining
Recommending questions using the mdl-based tree cut model
Proceedings of the 17th international conference on World Wide Web
Quality-aware collaborative question answering: methods and evaluation
Proceedings of the Second ACM International Conference on Web Search and Data Mining
Modeling information-seeker satisfaction in community question answering
ACM Transactions on Knowledge Discovery from Data (TKDD)
Ranking community answers by modeling question-answer relationships via analogical reasoning
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Understanding and summarizing answers in community-based question answering services
COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
Competition-based user expertise score estimation
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Click the search button and be happy: evaluating direct and immediate information access
Proceedings of the 20th ACM international conference on Information and knowledge management
Mining slang and urban opinion words and phrases from cQA services: an optimization approach
Proceedings of the fifth ACM international conference on Web search and data mining
Analyzing and predicting question quality in community question answering services
Proceedings of the 21st international conference companion on World Wide Web
Vote calibration in community question-answering systems
SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Exploiting user feedback to learn to rank answers in q&a forums: a case study with stack overflow
Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Estimating sharer reputation via social data calibration
Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
Fit or unfit: analysis and prediction of 'closed questions' on stack overflow
Proceedings of the first ACM conference on Online social networks
Information Sciences: an International Journal
Hi-index | 0.01 |
Community Question Answering (CQA) sites such as Yahoo! Answers have emerged as rich knowledge resources for information seekers. However, answers posted to CQA sites can be irrelevant, incomplete, redundant, incorrect, biased, ill-formed or even abusive. Hence, automatic selection of "good" answers for a given posted question is a practical research problem that will help us manage the quality of accumulated knowledge. One way to evaluate answer selection systems for CQA would be to use the Best Answers (BAs) that are readily available from the CQA sites. However, BAs may be biased, and even if they are not, there may be other good answers besides BAs. To remedy these two problems, we propose system evaluation methods that involve multiple answer assessors and graded-relevance information retrieval metrics. Our main findings from experiments using the NTCIR-8 CQA task data are that, using our evaluation methods, (a) we can detect many substantial differences between systems that would have been overlooked by BA-based evaluation; and (b) we can better identify hard questions (i.e. those that are handled poorly by many systems and therefore require focussed investigation) compared to BAbased evaluation. We therefore argue that our approach is useful for building effective CQA answer selection systems despite the cost of manual answer assessments.