Mining association rules between sets of items in large databases
SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
A new framework for itemset generation
PODS '98 Proceedings of the seventeenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Efficient mining of emerging patterns: discovering trends and differences
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Mining frequent patterns without candidate generation
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Mining frequent patterns with counting inference
ACM SIGKDD Explorations Newsletter - Special issue on “Scalable data mining algorithms”
Empirical bayes screening for multi-item associations
Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Detecting Group Differences: Mining Contrast Sets
Data Mining and Knowledge Discovery
The Space of Jumping Emerging Patterns and Its Incremental Maintenance Algorithms
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Selecting the right interestingness measure for association patterns
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Reducing multiclass to binary: a unifying approach for margin classifiers
The Journal of Machine Learning Research
CLOSET+: searching for the best strategies for mining frequent closed itemsets
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
On detecting differences between groups
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
On computing, storing and querying frequent patterns
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
In Defense of One-Vs-All Classification
The Journal of Machine Learning Research
Probability Estimates for Multi-class Classification by Pairwise Coupling
The Journal of Machine Learning Research
Relative risk and odds ratio: a data mining perspective
Proceedings of the twenty-fourth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Mining top-K covering rule groups for gene expression data
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Fast Algorithms for Frequent Itemset Mining Using FP-Trees
IEEE Transactions on Knowledge and Data Engineering
IEEE Transactions on Knowledge and Data Engineering
LCM ver.3: collaboration of array, bitmap and prefix tree for frequent itemset mining
Proceedings of the 1st international workshop on open source data mining: frequent pattern mining implementations
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Minimum description length principle: generators are preferable to closed patterns
AAAI'06 Proceedings of the 21st national conference on Artificial intelligence - Volume 1
Adequate condensed representations of patterns
Data Mining and Knowledge Discovery
Efficient Mining of Jumping Emerging Patterns with Occurrence Counts for Classification
RSCTC '08 Proceedings of the 6th International Conference on Rough Sets and Current Trends in Computing
Efficient Discovery of Top-K Minimal Jumping Emerging Patterns
RSCTC '08 Proceedings of the 6th International Conference on Rough Sets and Current Trends in Computing
Mining Class Contrast Functions by Gene Expression Programming
ADMA '09 Proceedings of the 5th International Conference on Advanced Data Mining and Applications
Efficient itemset generator discovery over a stream sliding window
Proceedings of the 18th ACM conference on Information and knowledge management
Efficient incremental mining of contrast patterns in changing data
Information Processing Letters
Feature construction based on closedness properties is not that simple
PAKDD'08 Proceedings of the 12th Pacific-Asia conference on Advances in knowledge discovery and data mining
Local projection in jumping emerging patterns discovery in transaction databases
PAKDD'08 Proceedings of the 12th Pacific-Asia conference on Advances in knowledge discovery and data mining
Transactions on rough sets XII
Mining contrast inequalities in numeric dataset
WAIM'10 Proceedings of the 11th international conference on Web-age information management
Intelligent Data Analysis
Efficient mining of jumping emerging patterns with occurrence counts for classification
Transactions on rough sets XIII
Using constraints to generate and explore higher order discriminative patterns
PAKDD'11 Proceedings of the 15th Pacific-Asia conference on Advances in knowledge discovery and data mining - Volume Part I
CPCQ: Contrast pattern based clustering quality index for categorical data
Pattern Recognition
Review: Situation identification techniques in pervasive computing: A review
Pervasive and Mobile Computing
A hierarchical approach to real-time activity recognition in body sensor networks
Pervasive and Mobile Computing
Clustering and understanding documents via discrimination information maximization
PAKDD'12 Proceedings of the 16th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part I
IEA/AIE'12 Proceedings of the 25th international conference on Industrial Engineering and Other Applications of Applied Intelligent Systems: advanced research in applied artificial intelligence
Information Sciences: an International Journal
Pattern-based real-time feedback for a temporal bone simulator
Proceedings of the 19th ACM Symposium on Virtual Reality Software and Technology
Key roles of closed sets and minimal generators in concise representations of frequent patterns
Intelligent Data Analysis
Hi-index | 0.00 |
The support-confidence framework is the most common measure used in itemset mining algorithms, for its antimonotonicity that effectively simplifies the search lattice. This computational convenience brings both quality and statistical flaws to the results as observed by many previous studies. In this paper, we introduce a novel algorithm that produces itemsets with ranked statistical merits under sophisticated test statistics such as chi-square, risk ratio, odds ratio, etc. Our algorithm is based on the concept of equivalence classes. An equivalence class is a set of frequent itemsets that always occur together in the same set of transactions. Therefore, itemsets within an equivalence class all share the same level of statistical significance regardless of the variety of test statistics. As an equivalence class can be uniquely determined and concisely represented by a closed pattern and a set of generators, we just mine closed patterns and generators, taking a simultaneous depth-first search scheme. This parallel approach has not been exploited by any prior work. We evaluate our algorithm on two aspects. In general, we compare to LCM and FPclose which are the best algorithms tailored for mining only closed patterns. In particular, we compare to epMiner which is the most recent algorithm for mining a type of relative risk patterns, known as minimal emerging patterns. Experimental results show that our algorithm is faster than all of them, sometimes even multiple orders of magnitude faster. These statistically ranked patterns and the efficiency have a high potential for real-life applications, especially in biomedical and financial fields where classical test statistics are of dominant interest.