A methodology for mining document-enriched heterogeneous information networks

Authors:
Miha Grcar;Nada Lavrac
Affiliations:
Jožef Stefan Institute, Dept. of Knowledge Technologies, Ljubljana, Slovenia;Jožef Stefan Institute, Dept. of Knowledge Technologies, Ljubljana, Slovenia
Venue:
DS'11 Proceedings of the 14th international conference on Discovery science
Year:
2011

Citing 21
Cited 0

Automatic text processing: the transformation, analysis, and retrieval of information by computer

Automatic text processing: the transformation, analysis, and retrieval of information by computer
Application of Spreading Activation Techniques in InformationRetrieval

Artificial Intelligence Review
Authoritative sources in a hyperlinked environment

Journal of the ACM (JACM)
Machine learning in automated text categorization

ACM Computing Surveys (CSUR)
Machine Learning

Machine Learning
Differential Evolution – A Simple and Efficient Heuristic for Global Optimization over Continuous Spaces

Journal of Global Optimization
Diffusion Kernels on Graphs and Other Discrete Input Spaces

ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
SimRank: a measure of structural-context similarity

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
A survey of kernels for structured data

ACM SIGKDD Explorations Newsletter
Exploratory Social Network Analysis with Pajek

Exploratory Social Network Analysis with Pajek
Text Mining Handbook: Advanced Approaches in Analyzing Unstructured Data

Text Mining Handbook: Advanced Approaches in Analyzing Unstructured Data
Link mining: a survey

ACM SIGKDD Explorations Newsletter
Dynamic personalized pagerank in entity-relation graphs

Proceedings of the 16th international conference on World Wide Web
Objectrank: authority-based keyword search in databases

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Learning implicit user interest hierarchy for context in personalization

Applied Intelligence
An improved centroid classifier for text categorization

Expert Systems with Applications: An International Journal
Cutting-plane training of structural SVMs

Machine Learning
Mining Heterogeneous Information Networks by Exploring the Power of Links

DS '09 Proceedings of the 12th International Conference on Discovery Science
OntoGen: semi-automatic ontology editor

Proceedings of the 2007 conference on Human interface: Part II
Graph regularized transductive classification on heterogeneous information networks

ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part I
CrossMine: efficient classification across multiple database relations

Proceedings of the 2004 European conference on Constraint-Based Mining and Inductive Databases

Quantified Score

Hi-index	0.00

Visualization

Abstract

The paper presents a new methodology for mining heterogeneous information networks, motivated by the fact that, in many real-life scenarios, documents are available in heterogeneous information networks, such as interlinked multimedia objects containing titles, descriptions, and subtitles. The methodology consists of transforming documents into bag-of-words vectors, decomposing the corresponding heterogeneous network into separate graphs and computing structural-context feature vectors with PageRank, and finally constructing a common feature vector space in which knowledge discovery is performed. We exploit this feature vector construction process to devise an efficient classification algorithm. We demonstrate the approach by applying it to the task of categorizing video lectures. We show that our approach exhibits low time and space complexity without compromising classification accuracy.