Learning block-preserving graph patterns and its application to data mining

  • Authors:
  • Hitoshi Yamasaki;Yosuke Sasaki;Takayoshi Shoudai;Tomoyuki Uchida;Yusuke Suzuki

  • Affiliations:
  • Department of Informatics, Kyushu University, Fukuoka, Japan 819-0395;Department of Informatics, Kyushu University, Fukuoka, Japan 819-0395;Department of Informatics, Kyushu University, Fukuoka, Japan 819-0395;Department of Intelligent Systems, Hiroshima City University, Hiroshima, Japan 731-3194;Department of Intelligent Systems, Hiroshima City University, Hiroshima, Japan 731-3194

  • Venue:
  • Machine Learning
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Recently, due to the rapid growth of electronic data having graph structures such as HTML and XML texts and chemical compounds, many researchers have been interested in data mining and machine learning techniques for finding useful patterns from graph-structured data (graph data). Since graph data contain a huge number of substructures and it tends to be computationally expensive to decide whether or not such data have given structural features, graph mining problems face computational difficulties. Let ${\mathcal{C}}$ be a graph class which satisfies a connected hereditary property and contains infinitely many different biconnected graphs, and for which a special kind of the graph isomorphism problem can be computed in polynomial time. In this paper, we consider learning and mining problems for ${\mathcal{C}}$ . Firstly, we define a new graph pattern, which is called a block preserving graph pattern (bp-graph pattern) for ${\mathcal{C}}$ . Secondly, we present a polynomial time algorithm for deciding whether or not a given bp-graph pattern matches a given graph in ${\mathcal{C}}$ . Thirdly, by giving refinement operators over bp-graph patterns, we present a polynomial time algorithm for finding a minimally generalized bp-graph pattern for ${\mathcal{C}}$ . Outerplanar graphs are planar graphs which can be embedded in the plane in such a way that all of vertices lie on the outer boundary. Many pharmacologic chemical compounds are known to be represented by outerplanar graphs. The class of connected outerplanar graphs ${\mathcal{O}}$ satisfies the above conditions for ${\mathcal{C}}$ . Next, we propose two incremental polynomial time algorithms for enumerating all frequent bp-graph patterns with respect to a given finite set of graphs in ${\mathcal{O}}$ . Finally, by reporting experimental results obtained by applying the two graph mining algorithms to a subset of the NCI dataset, we evaluate the performance of the two graph mining algorithms.