Screening and interpreting multi-item associations based on log-linear modeling

Authors:
Xintao Wu;Daniel Barbará;Yong Ye
Affiliations:
UNC at Charlotte, Charlotte, NC;George Mason University, Fairfax, VA;UNC at Charlotte, Charlotte, NC
Venue:
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Year:
2003

Citing 11
Cited 9

Mining association rules between sets of items in large databases

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Implementing data cubes efficiently

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Mining frequent patterns without candidate generation

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Independence is good: dependency-based histogram synopses for high-dimensional data

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Empirical bayes screening for multi-item associations

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Loglinear-Based Quasi Cubes

Journal of Intelligent Information Systems
Beyond Market Baskets: Generalizing Association Rules to Dependence Rules

Data Mining and Knowledge Discovery
Discovery-Driven Exploration of OLAP Data Cubes

EDBT '98 Proceedings of the 6th International Conference on Extending Database Technology: Advances in Database Technology
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Modeling and Imputation of Large Incomplete Multidimensional Datasets

DaWaK 2000 Proceedings of the 4th International Conference on Data Warehousing and Knowledge Discovery
Probabilistic Models for Query Approximation with Large Sparse Binary Data Sets

UAI '00 Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence

Graphical modeling based gene interaction analysis for microarray data

ACM SIGKDD Explorations Newsletter
Exploring gene causal interactions using an enhanced constraint-based method

Pattern Recognition
Self-sufficient itemsets: An approach to screening potentially interesting associations between items

ACM Transactions on Knowledge Discovery from Data (TKDD)
On addressing accuracy concerns in privacy preserving association rule mining

PAKDD'08 Proceedings of the 12th Pacific-Asia conference on Advances in knowledge discovery and data mining
A statistical interestingness measures for XML based association rules

PRICAI'10 Proceedings of the 11th Pacific Rim international conference on Trends in artificial intelligence
A log-linear approach to mining significant graph-relational patterns

Data & Knowledge Engineering
Efficient causal interaction learning with applications in microarray

ISMIS'05 Proceedings of the 15th international conference on Foundations of Intelligent Systems
Examining Multi-factor Interactions in Microblogging Based on Log-linear Modeling

ASONAM '12 Proceedings of the 2012 International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2012)
Interestingness measures for association rules within groups

Intelligent Data Analysis

Quantified Score

Hi-index	0.00

Visualization

Abstract

Association rules have received a lot of attention in the data mining community since their introduction. The classical approach to find rules whose items enjoy high support (appear in a lot of the transactions in the data set) is, however, filled with shortcomings. It has been shown that support can be misleading as an indicator of how interesting the rule is. Alternative measures, such as lift, have been proposed. More recently, a paper by DuMouchel et al. proposed the use of all-two-factor loglinear models to discover sets of items that cannot be explained by pairwise associations between the items involved. This approach, however, has its limitations, since it stops short of considering higher order interactions (other than pairwise) among the items. In this paper, we propose a method that examines the parameters of the fitted loglinear models to find all the significant association patterns among the items. Since fitting loglinear models for large data sets can be computationally prohibitive, we apply graph-theoretical results to divide the original set of items into components (sets of items) that are statistically independent from each other. We then apply loglinear modeling to each of the components and find the interesting associations among items in them. The technique is experimentally evaluated with a real data set (insurance data) and a series of synthetic data sets. The results show that the technique is effective in finding interesting associations among the items involved.