Dual active feature and sample selection for graph classification

Authors:
Xiangnan Kong;Wei Fan;Philip S. Yu
Affiliations:
University of Illinois at Chicago, Chicago, IL, USA;IBM T. J. Watson Research, Hawthorn, NY, USA;University of Illinois at Chicago, Chicago, IL, USA
Venue:
Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Year:
2011

Citing 24
Cited 0

Query by committee

COLT '92 Proceedings of the fifth annual workshop on Computational learning theory
Selective Sampling Using the Query by Committee Algorithm

Machine Learning
Frequent Subgraph Discovery

ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Support Vector Machine Active Learning with Application sto Text Classification

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
An Apriori-Based Algorithm for Mining Frequent Substructures from Graph Data

PKDD '00 Proceedings of the 4th European Conference on Principles of Data Mining and Knowledge Discovery
Mining Molecular Fragments: Finding Relevant Substructures of Molecules

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
gSpan: Graph-Based Substructure Pattern Mining

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Efficient Mining of Frequent Subgraphs in the Presence of Isomorphism

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Cyclic pattern kernels for predictive graph mining

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
A quickstart in frequent structure mining can make a difference

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Active learning using pre-clustering

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Graph indexing based on discriminative frequent structure analysis

ACM Transactions on Database Systems (TODS) - Special Issue: SIGMOD/PODS 2004
Active learning via transductive experimental design

ICML '06 Proceedings of the 23rd international conference on Machine learning
Mining significant graph patterns by leap search

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Hierarchical sampling for active learning

Proceedings of the 25th international conference on Machine learning
Direct mining of discriminative and essential frequent patterns via model-based search tree

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Dual Strategy Active Learning

ECML '07 Proceedings of the 18th European conference on Machine Learning
Effective multi-label active learning for text classification

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Identifying bug signatures using discriminative graph mining

Proceedings of the eighteenth international symposium on Software testing and analysis
Margin based active learning

COLT'07 Proceedings of the 20th annual conference on Learning theory
GAIA: graph classification using evolutionary computation

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Semi-supervised feature selection for graph classification

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
LIBSVM: A library for support vector machines

ACM Transactions on Intelligent Systems and Technology (TIST)
Measuring statistical dependence with hilbert-schmidt norms

ALT'05 Proceedings of the 16th international conference on Algorithmic Learning Theory

Quantified Score

Hi-index	0.00

Visualization

Abstract

Graph classification has become an important and active research topic in the last decade. Current research on graph classification focuses on mining discriminative subgraph features under supervised settings. The basic assumption is that a large number of labeled graphs are available. However, labeling graph data is quite expensive and time consuming for many real-world applications. In order to reduce the labeling cost for graph data, we address the problem of how to select the most important graph to query for the label. This problem is challenging and different from conventional active learning problems because there is no predefined feature vector. Moreover, the subgraph enumeration problem is NP-hard. The active sample selection problem and the feature selection problem are correlated for graph data. Before we can solve the active sample selection problem, we need to find a set of optimal subgraph features. To address this challenge, we demonstrate how one can simultaneously estimate the usefulness of a query graph and a set of subgraph features. The idea is to maximize the dependency between subgraph features and graph labels using an active learning framework. We propose a branch-and-bound algorithm to search for the optimal query graph and optimal features simultaneously. Empirical studies on nine real-world tasks demonstrate that the proposed method can obtain better accuracy on graph data than alternative approaches.