Discovering Significant Patterns

Authors:
Geoffrey I. Webb
Affiliations:
Faculty of Information Technology, Monash University, Clayton, Australia 3800
Venue:
Machine Learning
Year:
2007

Citing 26
Cited 35

C4.5: programs for machine learning

C4.5: programs for machine learning
Mining association rules between sets of items in large databases

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Beyond market baskets: generalizing association rules to correlations

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Explora: a multipattern and multistrategy discovery assistant

Advances in knowledge discovery and data mining
Efficient mining of emerging patterns: discovering trends and differences

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Pruning and summarizing the discovered associations

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Using association rules for product assortment decisions: a case study

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
A statistical theory for quantitative association rules

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Multiple Comparisons in Induction Algorithms

Machine Learning
Generating non-redundant association rules

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Empirical bayes screening for multi-item associations

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Discovering associations with numeric variables

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Real world performance of association rule algorithms

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Constraint-Based Rule Mining in Large, Dense Databases

Data Mining and Knowledge Discovery
Detecting Group Differences: Mining Contrast Sets

Data Mining and Knowledge Discovery
Mining Sequential Patterns

ICDE '95 Proceedings of the Eleventh International Conference on Data Engineering
Frequent Subgraph Discovery

ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Mining All Non-derivable Frequent Itemsets

PKDD '02 Proceedings of the 6th European Conference on Principles of Data Mining and Knowledge Discovery
Finding the most interesting patterns in a database quickly by using sequential sampling

The Journal of Machine Learning Research
Interestingness of frequent itemsets using Bayesian networks as background knowledge

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
On the discovery of significant statistical quantitative rules

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
K-Optimal Rule Discovery

Data Mining and Knowledge Discovery
Discovering significant rules

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Finding association rules that trade support optimally against confidence

Intelligent Data Analysis
OPUS: an efficient admissible algorithm for unordered search

Journal of Artificial Intelligence Research
Oversearching and layered search in empirical learning

IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 2

Semantic annotation of frequent patterns

ACM Transactions on Knowledge Discovery from Data (TKDD)
Assessing data mining results via swap randomization

ACM Transactions on Knowledge Discovery from Data (TKDD)
Layered critical values: a powerful direct-adjustment approach to discovering significant patterns

Machine Learning
Mining significant graph patterns by leap search

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
CSM-SD: Methodology for contrast set mining through subgroup discovery

Journal of Biomedical Informatics
Mining probabilistic automata: a statistical view of sequential pattern mining

Machine Learning
Tell me something I don't know: randomization strategies for iterative data mining

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
A lower bound on the sample size needed to perform a significant frequent pattern mining task

Pattern Recognition Letters
Cluster-grouping: from subgroup discovery to clustering

Machine Learning
Self-sufficient itemsets: An approach to screening potentially interesting associations between items

ACM Transactions on Knowledge Discovery from Data (TKDD)
Interestingness of Association Rules Using Symmetrical Tau and Logistic Regression

AI '09 Proceedings of the 22nd Australasian Joint Conference on Advances in Artificial Intelligence
A statistical interestingness measures for XML based association rules

PRICAI'10 Proceedings of the 11th Pacific Rim international conference on Trends in artificial intelligence
Automatic requirement extraction from test cases

RV'10 Proceedings of the First international conference on Runtime verification
A self-training approach for resolving object coreference on the semantic web

Proceedings of the 20th international conference on World wide web
Multiple hypothesis testing in pattern discovery

DS'11 Proceedings of the 14th international conference on Discovery science
Controlling false positives in association rule mining

Proceedings of the VLDB Endowment
Robust discovery of local patterns: subsets and stratification in adverse drug reaction surveillance

Proceedings of the 2nd ACM SIGHIT International Health Informatics Symposium
Hidden markov model-based time series prediction using motifs for detecting inter-time-serial correlations

Proceedings of the 27th Annual ACM Symposium on Applied Computing
Significant motifs in time series

Statistical Analysis and Data Mining
Efficient Search Methods for Statistical Dependency Rules

Fundamenta Informaticae - Machine Learning in Bioinformatics
Cover similarity based item set mining

Bisociative Knowledge Discovery
Summarizing data succinctly with the most informative itemsets

ACM Transactions on Knowledge Discovery from Data (TKDD) - Special Issue on the Best of SIGKDD 2011
Probabilistic generalization of formal concepts

Programming and Computing Software
Frequent item set mining

Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery
A bayesian approach for classification rule mining in quantitative databases

ECML PKDD'12 Proceedings of the 2012 European conference on Machine Learning and Knowledge Discovery in Databases - Volume Part II
A bayesian scoring technique for mining predictive and non-spurious rules

ECML PKDD'12 Proceedings of the 2012 European conference on Machine Learning and Knowledge Discovery in Databases - Volume Part II
Discovering associations in high-dimensional data

ADC '12 Proceedings of the Twenty-Third Australasian Database Conference - Volume 124
Analysis of traffic accident severity using Decision Rules via Decision Trees

Expert Systems with Applications: An International Journal
Speeding up correlation search for binary data

Pattern Recognition Letters
Formal and computational properties of the confidence boost of association rules

ACM Transactions on Knowledge Discovery from Data (TKDD)
A statistical significance testing approach to mining the most informative set of patterns

Data Mining and Knowledge Discovery
Redefinition of Decision Rules Based on the Importance of Elementary Conditions Evaluation

Fundamenta Informaticae
Interestingness measures for association rules within groups

Intelligent Data Analysis
Discovering episodes with compact minimal windows

Data Mining and Knowledge Discovery
Compass: A hybrid method for clinical and biobank data mining

Journal of Biomedical Informatics

Quantified Score

Hi-index	0.00

Visualization

Abstract

Pattern discovery techniques, such as association rule discovery, explore large search spaces of potential patterns to find those that satisfy some user-specified constraints. Due to the large number of patterns considered, they suffer from an extreme risk of type-1 error, that is, of finding patterns that appear due to chance alone to satisfy the constraints on the sample data. This paper proposes techniques to overcome this problem by applying well-established statistical practices. These allow the user to enforce a strict upper limit on the risk of experimentwise error. Empirical studies demonstrate that standard pattern discovery techniques can discover numerous spurious patterns when applied to random data and when applied to real-world data result in large numbers of patterns that are rejected when subjected to sound statistical evaluation. They also reveal that a number of pragmatic choices about how such tests are performed can greatly affect their power.