Problems Creating Task-relevant Clone Detection Reference Data

Authors:
Andrew Walenstein;Nitin Jyoti;Junwei Li;Yun Yang;Arun Lakhotia
Affiliations:
-;-;-;-;-
Venue:
WCRE '03 Proceedings of the 10th Working Conference on Reverse Engineering
Year:
2003

Citing 0
Cited 10

On the Use of Clone Detection for Identifying Crosscutting Concern Code

IEEE Transactions on Software Engineering
Method and implementation for investigating code clones in a software system

Information and Software Technology
Comparison and Evaluation of Clone Detection Tools

IEEE Transactions on Software Engineering
"Cloning considered harmful" considered harmful: patterns of cloning in software

Empirical Software Engineering
CloneDetective - A workbench for clone detection research

ICSE '09 Proceedings of the 31st International Conference on Software Engineering
Achieving accurate clone detection results

Proceedings of the 4th International Workshop on Software Clones
Model clone detection in practice

Proceedings of the 4th International Workshop on Software Clones
An extended assessment of type-3 clones as detected by state-of-the-art tools

Software Quality Control
Scalable clone detection using description logic

Proceedings of the 5th International Workshop on Software Clones
Clones in logic programs and how to detect them

LOPSTR'11 Proceedings of the 21st international conference on Logic-Based Program Synthesis and Transformation

Quantified Score

Hi-index	0.00

Visualization

Abstract

One prevalent method for evaluating the results of automatedsoftware analysis tools is to compare the tools'output to the judgment of human experts. This evaluationstrategy is commonly assumed in the field of software clonedetector research. We report our experiences from a studyusing several human judges who tried to establish "referencesets" of function clones for several medium-sized softwaresystems written in C. The study employed multiplejudges and followed a process typical for inter-coder reliabilityassurance wherein coders discussed classificationdiscrepancies until consensus is reached. A high level ofdisagreement was found for reference sets made specificallyfor reengineering task contexts. The results, although preliminary,raise questions about limitations of prior clonedetector evaluations and other similar tool evaluations. Implicationsare drawn for future work on reference data generation,tool evaluations, and benchmarking efforts.