Can I clone this piece of code here?

Authors:
Xiaoyin Wang;Yingnong Dang;Lu Zhang;Dongmei Zhang;Erica Lan;Hong Mei
Affiliations:
Peking University, China;Microsoft Research, China;Peking University, China;Microsoft Research, China;Microsoft, USA;Peking University, China
Venue:
Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering
Year:
2012

Citing 22
Cited 3

Bayesian Network Classifiers

Machine Learning - Special issue on learning with probabilistic representations
Comparing case-based reasoning classifiers for predicting high risk software components

Journal of Systems and Software
A guided tour to approximate string matching

ACM Computing Surveys (CSUR)
CCFinder: a multilinguistic token-based code clone detection system for large scale source code

IEEE Transactions on Software Engineering
On finding duplication and near-duplication in large software systems

WCRE '95 Proceedings of the Second Working Conference on Reverse Engineering
An empirical study of code clone genealogies

Proceedings of the 10th European software engineering conference held jointly with 13th ACM SIGSOFT international symposium on Foundations of software engineering
CP-Miner: Finding Copy-Paste and Related Bugs in Large-Scale Software Code

IEEE Transactions on Software Engineering
Data Mining Static Code Attributes to Learn Defect Predictors

IEEE Transactions on Software Engineering
DECKARD: Scalable and Accurate Tree-Based Detection of Code Clones

ICSE '07 Proceedings of the 29th international conference on Software Engineering
Context-based detection of clone-related bugs

Proceedings of the the 6th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineering
Scalable detection of semantic clones

Proceedings of the 30th international conference on Software engineering
An Empirical Study of Function Clones in Open Source Software

WCRE '08 Proceedings of the 2008 15th Working Conference on Reverse Engineering
"Cloning considered harmful" considered harmful: patterns of cloning in software

Empirical Software Engineering
A study of cross-validation and bootstrap for accuracy estimation and model selection

IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 2
An empirical study on the maintenance of source code clones

Empirical Software Engineering
Can clone detection support quality assessments of requirements specifications?

Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering - Volume 2
Matching dependence-related queries in the system dependence graph

Proceedings of the IEEE/ACM international conference on Automated software engineering
Code clone detection experience at microsoft

Proceedings of the 5th International Workshop on Software Clones
MeCC: memory comparison-based clone detector

Proceedings of the 33rd International Conference on Software Engineering
Frequency and risks of changes to clones

Proceedings of the 33rd International Conference on Software Engineering
Dealing with noise in defect prediction

Proceedings of the 33rd International Conference on Software Engineering
An empirical study of long-lived code clones

FASE'11/ETAPS'11 Proceedings of the 14th international conference on Fundamental approaches to software engineering: part of the joint European conferences on theory and practice of software

Understanding the evolution of type-3 clones: an exploratory study

Proceedings of the 10th Working Conference on Mining Software Repositories
Report on the international symposium on high confidence software (ISHCS 2011/2012)

ACM SIGSOFT Software Engineering Notes
Is this a bug or an obsolete test?

ECOOP'13 Proceedings of the 27th European conference on Object-Oriented Programming

Quantified Score

Hi-index	0.00

Visualization

Abstract

While code cloning is a convenient way for developers to reuse existing code, it may potentially lead to negative impacts, such as degrading code quality or increasing maintenance costs. Actually, some cloned code pieces are viewed as harmless since they evolve independently, while some other cloned code pieces are viewed as harmful since they need to be changed consistently, thus incurring extra maintenance costs. Recent studies demonstrate that neither the percentage of harmful code clones nor that of harmless code clones is negligible. To assist developers in leveraging the benefits of harmless code cloning and/or in avoiding the negative impacts of harmful code cloning, we propose a novel approach that automatically predicts the harmfulness of a code cloning operation at the point of performing copy-and-paste. Our insight is that the potential harmfulness of a code cloning operation may relate to some characteristics of the code to be cloned and the characteristics of its context. Based on a number of features extracted from the cloned code and the context of the code cloning operation, we use Bayesian Networks, a machine-learning technique, to predict the harmfulness of an intended code cloning operation. We evaluated our approach on two large-scale industrial software projects under two usage scenarios: 1) approving only cloning operations predicted to be very likely of no harm, and 2) blocking only cloning operations predicted to be very likely of harm. In the first scenario, our approach is able to approve more than 50% cloning operations with a precision higher than 94.9% in both subjects. In the second scenario, our approach is able to avoid more than 48% of the harmful cloning operations by blocking only 15% of the cloning operations for the first subject, and avoid more than 67% of the cloning operations by blocking only 34% of the cloning operations for the second subject.