Toward mining "concept keywords" from identifiers in large software projects

Authors:
Masaru Ohba;Katsuhiko Gondow
Affiliations:
Tokyo Institute of Technology, Tokyo, Japan;Tokyo Institute of Technology, Tokyo, Japan
Venue:
MSR '05 Proceedings of the 2005 international workshop on Mining software repositories
Year:
2005

Citing 9
Cited 6

Term-weighting approaches in automatic text retrieval

Information Processing and Management: an International Journal
Literate programming

Literate programming
Overview of reverse engineering and reuse research

Information and Software Technology
An empirical study of static call graph extractors

Proceedings of the 18th international conference on Software engineering
Extracting concepts from file names: a new file clustering criterion

Proceedings of the 20th international conference on Software engineering
Nomen Est Omen: Analyzing the Language of Function Identifiers

WCRE '99 Proceedings of the Sixth Working Conference on Reverse Engineering
Characterizing the Informal Knowledge Contained in Systems

WCRE '01 Proceedings of the Eighth Working Conference on Reverse Engineering (WCRE'01)
Restructuring Program Identifier Names

ICSM '00 Proceedings of the International Conference on Software Maintenance (ICSM'00)
Binary-Level Lightweight Data Integration to Develop Program Understanding Tools for Embedded Software in C

APSEC '04 Proceedings of the 11th Asia-Pacific Software Engineering Conference

Fine grained indexing of software repositories to support impact analysis

Proceedings of the 2006 international workshop on Mining software repositories
Exploring the neighborhood with dora to expedite software maintenance

Proceedings of the twenty-second IEEE/ACM international conference on Automated software engineering
A survey and taxonomy of approaches for mining software repositories in the context of software evolution

Journal of Software Maintenance and Evolution: Research and Practice
Automatically capturing source code context of NL-queries for software maintenance and reuse

ICSE '09 Proceedings of the 31st International Conference on Software Engineering
Automatically detecting and describing high level actions within methods

Proceedings of the 33rd International Conference on Software Engineering
The impact of identifier style on effort and comprehension

Empirical Software Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

We propose the Concept Keyword Term Frequency/Inverse Document Frequency (ckTF/IDF) method as a novel technique to efficiency mine concept keywords from identifiers in large software projects. ckTF/IDF is suitable for mining concept keywords, since the ckTF/IDF is more lightweight than the TF/IDF method, and the ckTF/IDF's heuristics is tuned for identifiers in programs.We then experimentally apply the ckTF/IDF to our educational operating system udos, consisting of around 5,000 lines in C code, which produced promising results; the udos's source code was processed in 1.4 seconds with an accuracy of around 57%. This preliminary result suggests that our approach is useful for mining concept keywords from identifiers, although we need more research and experience.