Problems Creating Task-relevant Clone Detection Reference Data

  • Authors:
  • Andrew Walenstein;Nitin Jyoti;Junwei Li;Yun Yang;Arun Lakhotia

  • Affiliations:
  • -;-;-;-;-

  • Venue:
  • WCRE '03 Proceedings of the 10th Working Conference on Reverse Engineering
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

One prevalent method for evaluating the results of automatedsoftware analysis tools is to compare the tools'output to the judgment of human experts. This evaluationstrategy is commonly assumed in the field of software clonedetector research. We report our experiences from a studyusing several human judges who tried to establish "referencesets" of function clones for several medium-sized softwaresystems written in C. The study employed multiplejudges and followed a process typical for inter-coder reliabilityassurance wherein coders discussed classificationdiscrepancies until consensus is reached. A high level ofdisagreement was found for reference sets made specificallyfor reengineering task contexts. The results, although preliminary,raise questions about limitations of prior clonedetector evaluations and other similar tool evaluations. Implicationsare drawn for future work on reference data generation,tool evaluations, and benchmarking efforts.