Nested Subtree Hash Kernels for Large-Scale Graph Classification over Streams

  • Authors:
  • Bin Li;Xingquan Zhu;Lianhua Chi;Chengqi Zhang

  • Affiliations:
  • -;-;-;-

  • Venue:
  • ICDM '12 Proceedings of the 2012 IEEE 12th International Conference on Data Mining
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Most studies on graph classification focus on designing fast and effective kernels. Several fast subtree kernels have achieved a linear time-complexity w.r.t. the number of edges under the condition that a common feature space (e.g., a subtree pattern list) is needed to represent all graphs. This will be infeasible when graphs are presented in a stream with rapidly emerging subtree patterns. In this case, computing a kernel matrix for graphs over the entire stream is difficult since the graphs in the expired chunks cannot be projected onto the unlimitedly expanding feature space again. This leads to a big trouble for graph classification over streams -- Different portions of graphs have different feature spaces. In this paper, we aim to enable large-scale graph classification over streams using the classical ensemble learning framework, which requires the data in different chunks to be in the same feature space. To this end, we propose a Nested Subtree Hashing (NSH) algorithm to recursively project the multi-resolution subtree patterns of different chunks onto a set of common low-dimensional feature spaces. We theoretically analyze the derived NSH kernel and obtain a number of favorable properties: 1) The NSH kernel is an unbiased and highly concentrated estimator of the fast subtree kernel. 2) The bound of convergence rate tends to be tighter as the NSH algorithm steps into a higher resolution. 3) The NSH kernel is robust in tolerating concept drift between chunks over a stream. We also empirically test the NSH kernel on both a large-scale synthetic graph data set and a real-world chemical compounds data set for anticancer activity prediction. The experimental results validate that the NSH kernel is indeed efficient and robust for graph classification over streams.