A log-linear approach to mining significant graph-relational patterns

Authors:
Christopher A. Besemann;Jianfei Wu;Anne M. Denton
Affiliations:
Department of Computer Science and Operations Research, North Dakota State University, Fargo, ND 58108-6050, USA and Microsoft, Fargo, ND 58104, USA;Department of Computer Science and Operations Research, North Dakota State University, Fargo, ND 58108-6050, USA and Microsoft, Fargo, ND 58104, USA;Department of Computer Science and Operations Research, North Dakota State University, Fargo, ND 58108-6050, USA
Venue:
Data & Knowledge Engineering
Year:
2011

Citing 36
Cited 4

Mining association rules between sets of items in large databases

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
A new framework for itemset generation

PODS '98 Proceedings of the seventeenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Mining frequent patterns without candidate generation

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Empirical bayes screening for multi-item associations

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Relational Data Mining

Relational Data Mining
Database Management Systems

Database Management Systems
Beyond Market Baskets: Generalizing Association Rules to Dependence Rules

Data Mining and Knowledge Discovery
Graph-Based Data Mining

IEEE Intelligent Systems
Alternative Interest Measures for Mining Associations in Databases

IEEE Transactions on Knowledge and Data Engineering
Frequent Subgraph Discovery

ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
An Apriori-Based Algorithm for Mining Frequent Substructures from Graph Data

PKDD '00 Proceedings of the 4th European Conference on Principles of Data Mining and Knowledge Discovery
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Mining Association Rules in Multiple Relations

ILP '97 Proceedings of the 7th International Workshop on Inductive Logic Programming
Answering the Most Correlated N Association Rules Efficiently

PKDD '02 Proceedings of the 6th European Conference on Principles of Data Mining and Knowledge Discovery
Selecting the right interestingness measure for association patterns

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
CoMine: Efficient Mining of Correlated Patterns

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
An iterative hypothesis-testing strategy for pattern discovery

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Screening and interpreting multi-item associations based on log-linear modeling

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Exploiting a support-based upper bound of Pearson's correlation coefficient for efficiently identifying strongly correlated pairs

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
A statistical framework for mining substitution rules

Knowledge and Information Systems
Fundamentals of Database Systems, Fourth Edition

Fundamentals of Database Systems, Fourth Edition
Knowledge discovery by probabilistic clustering of distributed databases

Data & Knowledge Engineering
Finding Frequent Patterns in a Large Sparse Graph*

Data Mining and Knowledge Discovery
Link mining: a survey

ACM SIGKDD Explorations Newsletter
Exploiting edge semantics in citation graphs using efficient, vertical ARM

Knowledge and Information Systems
NeMoFinder: dissecting genome-wide protein-protein interactions with meso-scale network motifs

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Support measures for graph data*

Data Mining and Knowledge Discovery
Finding highly correlated pairs efficiently with powerful pruning

CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
An efficient sanitization algorithm for balancing information privacy and knowledge discovery in association patterns mining

Data & Knowledge Engineering
Mining significant graph patterns by leap search

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Establishing relationships among patterns in stock market data

Data & Knowledge Engineering
The Subgraph Similarity Problem

IEEE Transactions on Knowledge and Data Engineering
Mining globally distributed frequent subgraphs in a single labeled graph

Data & Knowledge Engineering
Frequent subgraph pattern mining on uncertain graph data

Proceedings of the 18th ACM conference on Information and knowledge management
Autocorrelation and linkage cause bias in evaluation of relational learners

ILP'02 Proceedings of the 12th international conference on Inductive logic programming
Discovering frequent subgraphs over uncertain graph databases under probabilistic semantics

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining

A unique property of single-link distance and its application in data clustering

Data & Knowledge Engineering
Non-redundant web services composition based on a two-phase algorithm

Data & Knowledge Engineering
Expertise ranking using activity and contextual link measures

Data & Knowledge Engineering
Knowledge hiding from tree and graph databases

Data & Knowledge Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

Objects in many application domains can be characterized as link-based data, having both network (graph) information as well as structured information describing the nodes. Discovery of frequent patterns in this setting is vulnerable to problems that cannot occur in pattern mining on conventional data without network information. While patterns may appear to reflect novel characteristics of a combination of graph and node information, they may be expected based on patterns that could be found using conventional data mining techniques. We introduce a significance measure that identifies patterns that are unexpected based on node attributes in isolation and neighbor correlations. A statistical log-linear model is extended for this purpose and the structural symmetry of the link-based data is accounted for. Eliminating insignificant results reduces the output quantity by orders of magnitude. Efficiency is achieved by designing the pattern mining algorithm as a hybrid of conventional pattern mining and graph data mining. We demonstrate effectiveness and efficiency of the approach for yeast and for movie data.