Mining association rules between sets of items in large databases
SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Dynamic itemset counting and implication rules for market basket data
SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
A new framework for itemset generation
PODS '98 Proceedings of the seventeenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Beyond Market Baskets: Generalizing Association Rules to Dependence Rules
Data Mining and Knowledge Discovery
Fast Algorithms for Mining Association Rules in Large Databases
VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
ACM SIGKDD Explorations Newsletter
Local and Global Methods in Data Mining: Basic Techniques and Open Problems
ICALP '02 Proceedings of the 29th International Colloquium on Automata, Languages and Programming
Shrinkage estimator generalizations of Proximal Support Vector Machines
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
An iterative hypothesis-testing strategy for pattern discovery
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Screening and interpreting multi-item associations based on log-linear modeling
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Empirical Bayesian data mining for discovering patterns in post-marketing drug safety
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Graphical modeling based gene interaction analysis for microarray data
ACM SIGKDD Explorations Newsletter
Selecting the right objective measure for association analysis
Information Systems - Knowledge discovery and data mining (KDD 2002)
Interestingness of frequent itemsets using Bayesian networks as background knowledge
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
TAPER: A Two-Step Approach for All-Strong-Pairs Correlation Query in Large Databases
IEEE Transactions on Knowledge and Data Engineering
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Discovering Significant Patterns
Machine Learning
Mining statistically important equivalence classes and delta-discriminative emerging patterns
Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Semantic annotation of frequent patterns
ACM Transactions on Knowledge Discovery from Data (TKDD)
Assessing data mining results via swap randomization
ACM Transactions on Knowledge Discovery from Data (TKDD)
Vote prediction by iterative domain knowledge and attribute elimination
International Journal of Business Intelligence and Data Mining
Statistical mining of interesting association rules
Statistics and Computing
New probabilistic interest measures for association rules
Intelligent Data Analysis
Removing biases in unsupervised learning of sequential patterns
Intelligent Data Analysis
Volatile correlation computation: a checkpoint view
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Maximum entropy based significance of itemsets
Knowledge and Information Systems
Scalable pattern mining with Bayesian networks as background knowledge
Data Mining and Knowledge Discovery
An efficient rigorous approach for identifying statistically significant frequent itemsets
Proceedings of the twenty-eighth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Subspace sums for extracting non-random data from massive noise
Knowledge and Information Systems
ACM Transactions on Knowledge Discovery from Data (TKDD)
Measure-driven keyword-query expansion
Proceedings of the VLDB Endowment
Privacy Preserving Categorical Data Analysis with Unknown Distortion Parameters
Transactions on Data Privacy
Using a reinforced concept lattice to incrementally mine association rules from closed itemsets
KDID'06 Proceedings of the 5th international conference on Knowledge discovery in inductive databases
Estimating rates of rare events with multiple hierarchies through scalable log-linear models
Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
A log-linear approach to mining significant graph-relational patterns
Data & Knowledge Engineering
Temporal multi-hierarchy smoothing for estimating rates of rare events
Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Identifying potential adverse effects using the web: A new approach to medical hypothesis generation
Journal of Biomedical Informatics
Robust discovery of local patterns: subsets and stratification in adverse drug reaction surveillance
Proceedings of the 2nd ACM SIGHIT International Health Informatics Symposium
An Efficient Rigorous Approach for Identifying Statistically Significant Frequent Itemsets
Journal of the ACM (JACM)
Learning theory analysis for association rules and sequential event prediction
The Journal of Machine Learning Research
Interestingness measures for association rules within groups
Intelligent Data Analysis
Hi-index | 0.00 |
This paper considers the framework of the so-called "market basket problem", in which a database of transactions is mined for the occurrence of unusually frequent item sets. In our case, "unusually frequent" involves estimates of the frequency of each item set divided by a baseline frequency computed as if items occurred independently. The focus is on obtaining reliable estimates of this measure of interestingness for all item sets, even item sets with relatively low frequencies. For example, in a medical database of patient histories, unusual item sets including the item "patient death" (or other serious adverse event) might hopefully be flagged with as few as 5 or 10 occurrences of the item set, it being unacceptable to require that item sets occur in as many as 0.1% of millions of patient reports before the data mining algorithm detects a signal. Similar considerations apply in fraud detection applications. Thus we abandon the requirement that interesting item sets must contain a relatively large fixed minimal support, and adopt a criterion based on the results of fitting an empirical Bayes model to the item set counts. The model allows us to define a 95% Bayesian lower confidence limit for the "interestingness" measure of every item set, whereupon the item sets can be ranked according to their empirical Bayes confidence limits. For item sets of size J 2, we also distinguish between multi-item associations that can be explained by the observed J(J-1)/2 pairwise associations, and item sets that are significantly more frequent than their pairwise associations would suggest. Such item sets can uncover complex or synergistic mechanisms generating multi-item associations. This methodology has been applied within the U.S. Food and Drug Administration (FDA) to databases of adverse drug reaction reports and within AT&T to customer international calling histories. We also present graphical techniques for exploring and understanding the modeling results.