A log-linear approach to mining significant graph-relational patterns

  • Authors:
  • Christopher A. Besemann;Jianfei Wu;Anne M. Denton

  • Affiliations:
  • Department of Computer Science and Operations Research, North Dakota State University, Fargo, ND 58108-6050, USA and Microsoft, Fargo, ND 58104, USA;Department of Computer Science and Operations Research, North Dakota State University, Fargo, ND 58108-6050, USA and Microsoft, Fargo, ND 58104, USA;Department of Computer Science and Operations Research, North Dakota State University, Fargo, ND 58108-6050, USA

  • Venue:
  • Data & Knowledge Engineering
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Objects in many application domains can be characterized as link-based data, having both network (graph) information as well as structured information describing the nodes. Discovery of frequent patterns in this setting is vulnerable to problems that cannot occur in pattern mining on conventional data without network information. While patterns may appear to reflect novel characteristics of a combination of graph and node information, they may be expected based on patterns that could be found using conventional data mining techniques. We introduce a significance measure that identifies patterns that are unexpected based on node attributes in isolation and neighbor correlations. A statistical log-linear model is extended for this purpose and the structural symmetry of the link-based data is accounted for. Eliminating insignificant results reduces the output quantity by orders of magnitude. Efficiency is achieved by designing the pattern mining algorithm as a hybrid of conventional pattern mining and graph data mining. We demonstrate effectiveness and efficiency of the approach for yeast and for movie data.