Active refinement of clone anomaly reports

Authors:
Lucia;David Lo;Lingxiao Jiang;Aditya Budi
Affiliations:
Singapore Management University, Singapore;Singapore Management University, Singapore;Singapore Management University, Singapore;Singapore Management University, Singapore
Venue:
Proceedings of the 34th International Conference on Software Engineering
Year:
2012

Citing 38
Cited 0

A Nearest Hyperrectangle Learning Method

Machine Learning
C4.5: programs for machine learning

C4.5: programs for machine learning
Retrieving reusable software by sampling behavior

ACM Transactions on Software Engineering and Methodology (TOSEM)
An Experimental Comparison of the Nearest-Neighbor and Nearest-Hyperrectangle Algorithms

Machine Learning
Test Case Prioritization: A Family of Empirical Studies

IEEE Transactions on Software Engineering
Machine Learning

Machine Learning
CCFinder: a multilinguistic token-based code clone detection system for large scale source code

IEEE Transactions on Software Engineering
Using Slicing to Identify Duplication in Source Code

SAS '01 Proceedings of the 8th International Symposium on Static Analysis
Automated support for classifying software failure reports

Proceedings of the 25th International Conference on Software Engineering
Eliminating redundancies with a "composition with adaptation" meta-programming technique

Proceedings of the 9th European software engineering conference held jointly with 11th ACM SIGSOFT international symposium on Foundations of software engineering
DMS®: Program Transformations for Practical Scalable Software Evolution

Proceedings of the 26th International Conference on Software Engineering
Clone Detection in Source Code by Frequent Itemset Techniques

SCAM '04 Proceedings of the Source Code Analysis and Manipulation, Fourth IEEE International Workshop
Data Mining: Concepts and Techniques

Data Mining: Concepts and Techniques
An empirical study of code clone genealogies

Proceedings of the 10th European software engineering conference held jointly with 13th ACM SIGSOFT international symposium on Foundations of software engineering
"Cloning Considered Harmful" Considered Harmful

WCRE '06 Proceedings of the 13th Working Conference on Reverse Engineering
DECKARD: Scalable and Accurate Tree-Based Detection of Code Clones

ICSE '07 Proceedings of the 29th international conference on Software Engineering
Very-Large Scale Code Clone Analysis and Visualization of Open Source Programs Using Distributed CCFinder: D-CCFinder

ICSE '07 Proceedings of the 29th international conference on Software Engineering
Using Server Pages to Unify Clones in Web Applications: A Trade-Off Analysis

ICSE '07 Proceedings of the 29th international conference on Software Engineering
Tracking Code Clones in Evolving Software

ICSE '07 Proceedings of the 29th international conference on Software Engineering
Predicting Faults from Cached History

ICSE '07 Proceedings of the 29th international conference on Software Engineering
CP-Miner: a tool for finding copy-paste and related bugs in operating system code

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Which warnings should I fix first?

Proceedings of the the 6th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineering
Context-based detection of clone-related bugs

Proceedings of the the 6th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineering
Finding Clones with Dup: Analysis of an Experiment

IEEE Transactions on Software Engineering
CReN: a tool for tracking copy-and-paste code clones and renaming identifiers consistently in the IDE

Proceedings of the 2007 OOPSLA workshop on eclipse technology eXchange
A Study of Consistent and Inconsistent Changes to Code Clones

WCRE '07 Proceedings of the 14th Working Conference on Reverse Engineering
Scene Classification Using a Hybrid Generative/Discriminative Approach

IEEE Transactions on Pattern Analysis and Machine Intelligence
Predicting accurate and actionable static analysis warnings: an experimental approach

Proceedings of the 30th international conference on Software engineering
A criterion for filtering code clone related bugs

DEFECTS '08 Proceedings of the 2008 workshop on Defects in large software systems
On establishing a benchmark for evaluating static analysis alert prioritization and classification techniques

Proceedings of the Second ACM-IEEE international symposium on Empirical software engineering and measurement
A metric-based approach to identifying refactoring opportunities for merging code clones in a Java software system

Journal of Software Maintenance and Evolution: Research and Practice
Do code clones matter?

ICSE '09 Proceedings of the 31st International Conference on Software Engineering
Cross-project defect prediction: a large scale experiment on data vs. domain vs. process

Proceedings of the the 7th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineering
An effective refinement strategy for KNN text classifier

Expert Systems with Applications: An International Journal
Clone-Aware Configuration Management

ASE '09 Proceedings of the 2009 IEEE/ACM International Conference on Automated Software Engineering
Z-ranking: using statistical analysis to counter the impact of static analysis approximations

SAS'03 Proceedings of the 10th international conference on Static analysis
Scalable and systematic detection of buggy inconsistencies in source code

Proceedings of the ACM international conference on Object oriented programming systems languages and applications
Mining Discriminative Patterns for Classifying Trajectories on Road Networks

IEEE Transactions on Knowledge and Data Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

Software clones have been widely studied in the recent literature and shown useful for finding bugs because inconsistent changes among clones in a clone group may indicate potential bugs. However, many inconsistent clone groups are not real bugs. The excessive number of false positives could easily impede broad adoption of clone-based bug detection approaches. In this work, we aim to improve the usability of clonebased bug detection tools by increasing the rate of true positives found when a developer analyzes anomaly reports. Our idea is to control the number of anomaly reports a user can see at a time and actively incorporate incremental user feedback to continually refine the anomaly reports. Our system first presents top few anomaly reports from the list of reports generated by a tool in its default ordering. Users then either accept or reject each of the reports. Based on the feedback, our system automatically and iteratively refines a classification model for anomalies and re-sorts the rest of the reports. Our goal is to present the true positives to the users earlier than the default ordering. The rationale of the idea is based on our observation that false positives among the inconsistent clone groups could share common features (in terms of code structure, programming patterns, etc.), and these features can be learned from the incremental user feedback. We evaluate our refinement process on three sets of clonebased anomaly reports from three large real programs: the Linux Kernel (C), Eclipse, and ArgoUML (Java), extracted by a clone-based anomaly detection tool. The results show that compared to the original ordering of bug reports, we can improve the rate of true positives found (i.e., true positives are found faster) by 11%, 87%, and 86% for Linux kernel, Eclipse, and ArgoUML, respectively.