Discovering frequent pattern pairs

Authors:
Carlos Ordonez;Zhibo Chen
Affiliations:
Department of Computer Science, University of Houston, Houston, TX, USA;Department of Computer Science, University of Houston, Houston, TX, USA
Venue:
Intelligent Data Analysis
Year:
2013

Citing 20
Cited 0

Mining association rules between sets of items in large databases

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Implementing data cubes efficiently

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
An overview of data warehousing and OLAP technology

ACM SIGMOD Record
Cubetree: organization of and bulk incremental updates on the data cube

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Exploratory mining and pruning optimizations of constrained associations rules

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Data mining: concepts and techniques

Data mining: concepts and techniques
Constrained frequent pattern mining: a pattern-growth view

ACM SIGKDD Explorations Newsletter
Discovery-Driven Exploration of OLAP Data Cubes

EDBT '98 Proceedings of the 6th International Conference on Extending Database Technology: Advances in Database Technology
Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Total

ICDE '96 Proceedings of the Twelfth International Conference on Data Engineering
Integration of Data Mining with Database Technology

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Prediction cubes

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Rule interestingness analysis using OLAP operations

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Constraining and summarizing association rules in medical data

Knowledge and Information Systems
Finding association rules that trade support optimally against confidence

Intelligent Data Analysis
New probabilistic interest measures for association rules

Intelligent Data Analysis
Comparing Reliability of Association Rules and OLAP Statistical Tests

ICDMW '08 Proceedings of the 2008 IEEE International Conference on Data Mining Workshops
Evaluating statistical tests on OLAP cubes to compare degree of disease

IEEE Transactions on Information Technology in Biomedicine - Special section on computational intelligence in medical systems
Ensemble Rough Hypercuboid Approach for Classifying Cancers

IEEE Transactions on Knowledge and Data Engineering
Association rule discovery with the train and test approach for heart disease prediction

IEEE Transactions on Information Technology in Biomedicine
Mining association rules with improved semantics in medical databases

Artificial Intelligence in Medicine

Quantified Score

Hi-index	0.00

Visualization

Abstract

Cubes and association rules discover frequent patterns in a data set, most of which are not significant. Thus previous research has introduced search constraints and statistical metrics to discover significant patterns and reduce processing time. We introduce cube pairs comparing cube groups based on a parametric statistical test and rule pairs based on two similar association rules, which are pattern pair generalizations of cubes and association rules, respectively. We introduce algorithmic optimizations to discover comparable pattern sets. We carefully study why both techniques agree or disagree on the validity of specific pairs, considering p-value for statistical tests, as well as confidence for association rules. In addition, we analyze the probabilistic distribution of target attributes given confidence thresholds. We also introduce a reliability metric based on cross-validation, which enables an objective comparison between both patterns. We present an extensive experimental evaluation with real data sets to understand significance and reliability of pattern pairs. We show cube pairs generally produce more reliable results than rule pairs.