SkyGraph: an algorithm for important subgraph discovery in relational graphs

  • Authors:
  • Apostolos N. Papadopoulos;Apostolos Lyritsis;Yannis Manolopoulos

  • Affiliations:
  • Data Engineering Research Lab., Department of Informatics, Aristotle University, Thessaloniki, Greece 54124;Data Engineering Research Lab., Department of Informatics, Aristotle University, Thessaloniki, Greece 54124;Data Engineering Research Lab., Department of Informatics, Aristotle University, Thessaloniki, Greece 54124

  • Venue:
  • Data Mining and Knowledge Discovery
  • Year:
  • 2008

Quantified Score

Hi-index 0.01

Visualization

Abstract

A significant number of applications require effective and efficient manipulation of relational graphs, towards discovering important patterns. Some example applications are: (i) analysis of microarray data in bioinformatics, (ii) pattern discovery in a large graph representing a social network, (iii) analysis of transportation networks, (iv) community discovery in Web data. The basic approach followed by existing methods is to apply mining techniques on graph data to discover important patterns, such as subgraphs that are likely to be useful. However, in some cases the number of mined patterns is large, posing difficulties in selecting the most important ones. For example, applying frequent subgraph mining on a set of graphs the system returns all connected subgraphs whose frequency is above a specified (usually user-defined) threshold. The number of discovered patterns may be large, and this number depends on the data characteristics and the frequency threshold specified. It would be more convenient for the user if "goodness" criteria could be set to evaluate the usefulness of these patterns, and if the user could provide preferences to the system regarding the characteristics of the discovered patterns. In this paper, we propose a methodology to support such preferences by applying subgraph discovery in relational graphs towards retrieving important connected subgraphs. The importance of a subgraph is determined by: (i) the order of the subgraph (the number of vertices) and (ii) the subgraph edge connectivity. The performance of the proposed technique is evaluated by using real-life as well as synthetically generated data sets.