Learning from 6,000 projects: lightweight cross-project anomaly detection

  • Authors:
  • Natalie Gruska;Andrzej Wasylkowski;Andreas Zeller

  • Affiliations:
  • Queen's University, Kingston, ON, Canada;Saarland University, Saarbrücken, Germany;Saarland University, Saarbrücken, Germany

  • Venue:
  • Proceedings of the 19th international symposium on Software testing and analysis
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Real production code contains lots of knowledge - on the domain, on the architecture, and on the environment. How can we leverage this knowledge in new projects? Using a novel lightweight source code parser, we have mined more than 6,000 open source Linux projects (totaling 200,000,000 lines of code) to obtain 16,000,000 temporal properties reflecting normal interface usage. New projects can be checked against these rules to detect anomalies - that is, code that deviates from the wisdom of the crowds. In a sample of 20 projects, ~25% of the top-ranked anomalies uncovered actual code smells or defects.