Understanding the roles of sub-graph features for graph classification: an empirical study perspective

Authors:
Ting Guo;Xingquan Zhu
Affiliations:
QCIS Centre, Faculty of Eng. & Info. Tech., University of Technology, Sydney, NSW 2007, Australia, Sydney, Australia;QCIS Centre, Faculty of Eng. & Info. Tech., University of Technology, Sydney, NSW 2007, Australia, Sydney, Australia
Venue:
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Year:
2013

Citing 5
Cited 0

Induction of Decision Trees

Machine Learning
Graph indexing: a frequent structure-based approach

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Mining significant graph patterns by leap search

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Community evolution in dynamic multi-mode networks

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Cluster Ensembles Based on Vector Space Embeddings of Graphs

MCS '09 Proceedings of the 8th International Workshop on Multiple Classifier Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Graph classification concerns the learning of discriminative models, from structured training data, to classify previously unseen graph samples into specific categories, where the main challenge is to explore structural information in the training data to build classifiers. One of the most common graph classification approaches is to use sub-graph features to convert graphs into instance-feature representations, so generic learning algorithms can be applied to derive learning models. Finding good sub-graph features is regarded as an important task for this type of learning approaches, despite that there is no comprehensive understanding on (1) how effective sub-graph features can be used for graph classification? (2) how many sub-graph features are sufficient for good classification results? (3) does the length of the sub-graph features play major roles for classification? and (4) whether some random sub-graphs can be used for graph representation and classification? Motivated by the above concerns, we carry out empirical studies on four real-world graph classification tasks, by using three types of sub-graph features, including frequent sub-graphs, frequent sub-graph selected by using information gain, and random sub-graphs, and by using two types of learning algorithms including Support Vector Machines and Nearest Neighbour. Our experiments show that (1) the discriminative power of sub-graphs varies by their sizes; (2) random sub-graphs have a reasonably good performance; (3) number of sub-graphs is important to ensure good performance; and (4) increasing number of sub-graphs reduces the difference between classifiers built from different sub-graphs. Our studies provide a practical guidance for designing effective sub-graph based graph classification methods.