Active refinement of clone anomaly reports

  • Authors:
  • Lucia;David Lo;Lingxiao Jiang;Aditya Budi

  • Affiliations:
  • Singapore Management University, Singapore;Singapore Management University, Singapore;Singapore Management University, Singapore;Singapore Management University, Singapore

  • Venue:
  • Proceedings of the 34th International Conference on Software Engineering
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Software clones have been widely studied in the recent literature and shown useful for finding bugs because inconsistent changes among clones in a clone group may indicate potential bugs. However, many inconsistent clone groups are not real bugs. The excessive number of false positives could easily impede broad adoption of clone-based bug detection approaches. In this work, we aim to improve the usability of clonebased bug detection tools by increasing the rate of true positives found when a developer analyzes anomaly reports. Our idea is to control the number of anomaly reports a user can see at a time and actively incorporate incremental user feedback to continually refine the anomaly reports. Our system first presents top few anomaly reports from the list of reports generated by a tool in its default ordering. Users then either accept or reject each of the reports. Based on the feedback, our system automatically and iteratively refines a classification model for anomalies and re-sorts the rest of the reports. Our goal is to present the true positives to the users earlier than the default ordering. The rationale of the idea is based on our observation that false positives among the inconsistent clone groups could share common features (in terms of code structure, programming patterns, etc.), and these features can be learned from the incremental user feedback. We evaluate our refinement process on three sets of clonebased anomaly reports from three large real programs: the Linux Kernel (C), Eclipse, and ArgoUML (Java), extracted by a clone-based anomaly detection tool. The results show that compared to the original ordering of bug reports, we can improve the rate of true positives found (i.e., true positives are found faster) by 11%, 87%, and 86% for Linux kernel, Eclipse, and ArgoUML, respectively.