Using text mining and link analysis for software mining

  • Authors:
  • Miha Grcar;Marko Grobelnik;Dunja Mladenic

  • Affiliations:
  • Jozef Stefan Institute, Dept. of Knowledge Technologies, Ljubljana, Slovenia;Jozef Stefan Institute, Dept. of Knowledge Technologies, Ljubljana, Slovenia;Jozef Stefan Institute, Dept. of Knowledge Technologies, Ljubljana, Slovenia

  • Venue:
  • MCD'07 Proceedings of the 3rd ECML/PKDD international conference on Mining complex data
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Many data mining techniques are these days in use for ontology learning - text mining, Web mining, graph mining, link analysis, relational data mining, and so on. In the current state-of-the-art bundle there is a lack of "software mining" techniques. This term denotes the process of extracting knowledge out of source code. In this paper we approach the software mining task with a combination of text mining and link analysis techniques. We discuss how each instance (i.e. a programming construct such as a class or a method) can be converted into a feature vector that combines the information about how the instance is interlinked with other instances, and the information about its (textual) content. The so-obtained feature vectors serve as the basis for the construction of the domain ontology with OntoGen, an existing system for semi-automatic data-driven ontology construction.