Classification of source code archives

  • Authors:
  • Robert Krovetz;Secil Ugurel;C. Lee Giles

  • Affiliations:
  • NEC Laboratories America, Princeton, NJ;NEC Laboratories America, Princeton, NJ;Pennsylvania State University, University Park, PA

  • Venue:
  • Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

The World Wide Web contains a number of source code archives. Programs are usually classified into various categories within the archive by hand. We report on experiments for automatic classification of source code into these categories. We examined a number of factors that affect classification accuracy. Weighting features by expected entropy loss makes a significant improvement in classification accuracy. We show a Support Vector Machine can be trained to classify source code with a high degree of accuracy. We feel these results show promise for software reuse.