On the discovery of significant statistical quantitative rules

Authors:
Hong Zhang;Balaji Padmanabhan;Alexander Tuzhilin
Affiliations:
University of Pennsylvania, Philadelphia, PA;University of Pennsylvania, Philadelphia, PA;New York University, New York, NY
Venue:
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Year:
2004

Citing 24
Cited 27

Mining quantitative association rules in large relational tables

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Data mining using two-dimensional optimized association rules: scheme, algorithms, and visualization

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Dynamic itemset counting and implication rules for market basket data

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Beyond market baskets: generalizing association rules to correlations

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Knowledge-Based Learning in Exploratory Science: Learning Rules to Predict Rodent Carcinogenicity

Machine Learning - Special issue on applications of machine learning and the knowledge discovery process
Statistics and data mining techniques for lifetime value modeling

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Using association rules for product assortment decisions: a case study

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
A statistical theory for quantitative association rules

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Detecting change in categorical data: mining contrast sets

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Multiple Comparisons in Induction Algorithms

Machine Learning
Small is beautiful: discovering the minimal set of unexpected patterns

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Identifying non-actionable association rules

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Discovering associations with numeric variables

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Knowledge Discovery in Databases

Knowledge Discovery in Databases
The CN2 Induction Algorithm

Machine Learning
Discovery-Driven Exploration of OLAP Data Cubes

EDBT '98 Proceedings of the 6th International Conference on Extending Database Technology: Advances in Database Technology
Using a Permutation Test for Attribute Selection in Decision Trees

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Significance Tests for Patterns in Continuous Data

ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Determining Hit Rate in Pattern Search

Proceedings of the ESF Exploratory Workshop on Pattern Detection and Discovery
Rule-based anomaly pattern detection for detecting disease outbreaks

Eighteenth national conference on Artificial intelligence
An iterative hypothesis-testing strategy for pattern discovery

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
On detecting differences between groups

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
A study of cross-validation and bootstrap for accuracy estimation and model selection

IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 2

On Characterization and Discovery of Minimal Unexpected Patterns in Rule Discovery

IEEE Transactions on Knowledge and Data Engineering
Interestingness measures for data mining: A survey

ACM Computing Surveys (CSUR)
Mining quantitative correlated patterns using an information-theoretic approach

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Discovering significant rules

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Discovering Significant Patterns

Machine Learning
Statistical mining of interesting association rules

Statistics and Computing
Privacy-preserving statistical quantitative rules mining

Proceedings of the 2nd international conference on Scalable information systems
Layered critical values: a powerful direct-adjustment approach to discovering significant patterns

Machine Learning
Correlated pattern mining in quantitative databases

ACM Transactions on Database Systems (TODS)
An information-theoretic approach to quantitative association rule mining

Knowledge and Information Systems
Measuring interestingness of discovered skewed patterns in data cubes

Decision Support Systems
An intelligent questionnaire analysis expert system

Expert Systems with Applications: An International Journal
Multi-level Frequent Pattern Mining

DASFAA '09 Proceedings of the 14th International Conference on Database Systems for Advanced Applications
Issues in pattern mining and their resolutions

C3S2E '09 Proceedings of the 2nd Canadian Conference on Computer Science and Software Engineering
An efficient rigorous approach for identifying statistically significant frequent itemsets

Proceedings of the twenty-eighth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
An algorithm to mine general association rules from tabular data

Information Sciences: an International Journal
Self-sufficient itemsets: An approach to screening potentially interesting associations between items

ACM Transactions on Knowledge Discovery from Data (TKDD)
Interestingness of Association Rules Using Symmetrical Tau and Logistic Regression

AI '09 Proceedings of the 22nd Australasian Joint Conference on Advances in Artificial Intelligence
Minimum variance associations: discovering relationships in numerical data

PAKDD'08 Proceedings of the 12th Pacific-Asia conference on Advances in knowledge discovery and data mining
Interestingness measures for association rules based on statistical validity

Knowledge-Based Systems
Multiple hypothesis testing in pattern discovery

DS'11 Proceedings of the 14th international conference on Discovery science
From information to operations: Service quality and customer retention

ACM Transactions on Management Information Systems (TMIS)
Distribution rules with numeric attributes of interest

PKDD'06 Proceedings of the 10th European conference on Principle and Practice of Knowledge Discovery in Databases
An Efficient Rigorous Approach for Identifying Statistically Significant Frequent Itemsets

Journal of the ACM (JACM)
Case study: Improving debt collection processes using rule-based decision engines: A case study of Capital One

International Journal of Information Management: The Journal for Information Professionals
Significant motifs in time series

Statistical Analysis and Data Mining
Mining sequential patterns with extensible knowledge representation

Intelligent Data Analysis

Quantified Score

Hi-index	0.01

Visualization

Abstract

In this paper we study market share rules, rules that have a certain market share statistic associated with them. Such rules are particularly relevant for decision making from a business perspective. Motivated by market share rules, in this paper we consider statistical quantitative rules (SQ rules) that are quantitative rules in which the RHS can be any statistic that is computed for the segment satisfying the LHS of the rule. Building on prior work, we present a statistical approach for learning all significant SQ rules, i.e., SQ rules for which a desired statistic lies outside a confidence interval computed for this rule. In particular we show how resampling techniques can be effectively used to learn significant rules. Since our method considers the significance of a large number of rules in parallel, it is susceptible to learning a certain number of "false" rules. To address this, we present a technique that can determine the number of significant SQ rules that can be expected by chance alone, and suggest that this number can be used to determine a "false discovery rate" for the learning procedure. We apply our methods to online consumer purchase data and report the results.