Toward mining "concept keywords" from identifiers in large software projects

  • Authors:
  • Masaru Ohba;Katsuhiko Gondow

  • Affiliations:
  • Tokyo Institute of Technology, Tokyo, Japan;Tokyo Institute of Technology, Tokyo, Japan

  • Venue:
  • MSR '05 Proceedings of the 2005 international workshop on Mining software repositories
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

We propose the Concept Keyword Term Frequency/Inverse Document Frequency (ckTF/IDF) method as a novel technique to efficiency mine concept keywords from identifiers in large software projects. ckTF/IDF is suitable for mining concept keywords, since the ckTF/IDF is more lightweight than the TF/IDF method, and the ckTF/IDF's heuristics is tuned for identifiers in programs.We then experimentally apply the ckTF/IDF to our educational operating system udos, consisting of around 5,000 lines in C code, which produced promising results; the udos's source code was processed in 1.4 seconds with an accuracy of around 57%. This preliminary result suggests that our approach is useful for mining concept keywords from identifiers, although we need more research and experience.