Tuning research tools for scalability and performance: The NiCad experience

  • Authors:
  • James R. Cordy;Chanchal K. Roy

  • Affiliations:
  • School of Computing, Queen's University, Kingston, Ontario, Canada;Department of Computer Science, University of Saskatchewan, Saskatoon, Saskatchewan, Canada

  • Venue:
  • Science of Computer Programming
  • Year:
  • 2014

Quantified Score

Hi-index 0.00

Visualization

Abstract

Clone detection is a research technique for analyzing software systems for similarities, with applications in software understanding, maintenance, evolution, license enforcement and many other issues. The NiCad near-miss clone detection method has been shown to yield highly accurate results in both precision and recall. However, its naive two-step method, involving a parsing first step to identify and normalize code fragments, followed by a text line-based second step using longest common subsequence (LCS) to compare fragments, has proven difficult to migrate to the efficiency and scalability required for large scale research applications. Rather than presenting the NiCad tool itself in detail, this paper focuses on our experience in migrating NiCad from an initial rapid prototype to a practical scalable research tool. The process has increased overall performance by a factor of up to 40 and clone detection speed by a factor of over 400, while reducing memory and processor requirements to fit on a standard laptop. We apply a sequence of four different kinds of performance optimizations and analyze the effect of each optimization in detail. We believe that the lessons of our experience in migrating NiCad from research prototype to production performance may be beneficial to others who are facing a similar problem.