A methodology for mining document-enriched heterogeneous information networks

  • Authors:
  • Miha Grcar;Nada Lavrac

  • Affiliations:
  • Jožef Stefan Institute, Dept. of Knowledge Technologies, Ljubljana, Slovenia;Jožef Stefan Institute, Dept. of Knowledge Technologies, Ljubljana, Slovenia

  • Venue:
  • DS'11 Proceedings of the 14th international conference on Discovery science
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

The paper presents a new methodology for mining heterogeneous information networks, motivated by the fact that, in many real-life scenarios, documents are available in heterogeneous information networks, such as interlinked multimedia objects containing titles, descriptions, and subtitles. The methodology consists of transforming documents into bag-of-words vectors, decomposing the corresponding heterogeneous network into separate graphs and computing structural-context feature vectors with PageRank, and finally constructing a common feature vector space in which knowledge discovery is performed. We exploit this feature vector construction process to devise an efficient classification algorithm. We demonstrate the approach by applying it to the task of categorizing video lectures. We show that our approach exhibits low time and space complexity without compromising classification accuracy.