Finding relevant answers in software forums

Authors:
Swapna Gottipati;David Lo; Jing Jiang
Affiliations:
School of Information Systems, Singapore Management University, Singapore;School of Information Systems, Singapore Management University, Singapore;School of Information Systems, Singapore Management University, Singapore
Venue:
ASE '11 Proceedings of the 2011 26th IEEE/ACM International Conference on Automated Software Engineering
Year:
2011

Citing 25
Cited 6

Cumulated gain-based evaluation of IR techniques

ACM Transactions on Information Systems (TOIS)
What Can Programmer Questions Tell Us About Frameworks?

IWPC '05 Proceedings of the 13th International Workshop on Program Comprehension
Mining Version Histories to Guide Software Changes

IEEE Transactions on Software Engineering
Finding similar questions in large question and answer archives

Proceedings of the 14th ACM international conference on Information and knowledge management
Mining email social networks

Proceedings of the 2006 international workshop on Mining software repositories
What Can OSS Mailing Lists Tell Us? A Preliminary Psychometric Text Analysis of the Apache Developer Mailing List

MSR '07 Proceedings of the Fourth International Workshop on Mining Software Repositories
An approach to mining call-usage patternswith syntactic context

Proceedings of the twenty-second IEEE/ACM international conference on Automated software engineering
A Reverse Engineering Tool for Extracting Protocols of Networked Applications

WCRE '07 Proceedings of the 14th Working Conference on Reverse Engineering
Detecting Implied Scenarios from Execution Traces

WCRE '07 Proceedings of the 14th Working Conference on Reverse Engineering
An approach to detecting duplicate bug reports using natural language and execution information

Proceedings of the 30th international conference on Software engineering
Finding question-answer pairs from online forums

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Introduction to Information Retrieval

Introduction to Information Retrieval
Improving the readability of defect reports

Proceedings of the 2008 international workshop on Recommendation systems for software engineering
Toward an understanding of bug fix patterns

Empirical Software Engineering
Predicting build failures using social network analysis on developer communication

ICSE '09 Proceedings of the 31st International Conference on Software Engineering
How tagging helps bridge the gap between social and technical aspects in software development

ICSE '09 Proceedings of the 31st International Conference on Software Engineering
A classification-based approach to question answering in discussion boards

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Graph-based mining of multiple object usage patterns

Proceedings of the the 7th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineering
SpotWeb: Detecting Framework Hotspots and Coldspots via Mining Open Source Code on the Web

ASE '08 Proceedings of the 2008 23rd IEEE/ACM International Conference on Automated Software Engineering
Extracting paraphrases of technical terms from noisy parallel software corpora

ACLShort '09 Proceedings of the ACL-IJCNLP 2009 Conference Short Papers
Beyond the Lone Reverse Engineer: Insourcing, Outsourcing and Crowdsourcing

WCRE '09 Proceedings of the 2009 16th Working Conference on Reverse Engineering
Semi-supervised speech act recognition in emails and forums

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 3 - Volume 3
Information needs in bug reports: improving cooperation between developers and users

Proceedings of the 2010 ACM conference on Computer supported cooperative work
Mining Temporal Specifications from Object Usage

ASE '09 Proceedings of the 2009 IEEE/ACM International Conference on Automated Software Engineering
Inferring Resource Specifications from Natural Language API Documentation

ASE '09 Proceedings of the 2009 IEEE/ACM International Conference on Automated Software Engineering

Identifying Linux bug fixing patches

Proceedings of the 34th International Conference on Software Engineering
Semi-automatically extracting FAQs to improve accessibility of software development knowledge

Proceedings of the 34th International Conference on Software Engineering
Observatory of trends in software related microblogs

Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering
Has this bug been reported?

Proceedings of the ACM SIGSOFT 20th International Symposium on the Foundations of Software Engineering
Tag recommendation in software information sites

Proceedings of the 10th Working Conference on Mining Software Repositories
Leveraging machine learning and information retrieval techniques in software evolution tasks: summary of the first MALIR-SE workshop, at ASE 2013

ACM SIGSOFT Software Engineering Notes

Quantified Score

Hi-index	0.00

Visualization

Abstract

Online software forums provide a huge amount of valuable content. Developers and users often ask questions and receive answers from such forums. The availability of a vast amount of thread discussions in forums provides ample opportunities for knowledge acquisition and summarization. For a given search query, current search engines use traditional information retrieval approach to extract webpages containing relevant keywords. However, in software forums, often there are many threads containing similar keywords where each thread could contain a lot of posts as many as 1,000 or more. Manually finding relevant answers from these long threads is a painstaking task to the users. Finding relevant answers is particularly hard in software forums as: complexities of software systems cause a huge variety of issues often expressed in similar technical jargons, and software forum users are often expert internet users who often posts answers in multiple venues creating many duplicate posts, often without satisfying answers, in the world wide web. To address this problem, this paper provides a semantic search engine framework to process software threads and recover relevant answers according to user queries. Different from standard information retrieval engine, our framework infer semantic tags of posts in the software forum threads and utilize these tags to recover relevant answer posts. In our case study, we analyze 6,068 posts from three software forums. In terms of accuracy of our inferred tags, we could achieve on average an overall precision, recall and F-measure of 67%, 71%, and 69% respectively. To empirically study the benefit of our overall framework, we also conduct a user-assisted study which shows that as compared to a standard information retrieval approach, our proposed framework could increase mean average precision from 17% to 71% in retrieving relevant answers to various queries and achieve a Normalized Discounted Cumulative Gain (nDCG) @1 score of 91.2% and nDCG@2 score of 71.6%.