Mining statistically important equivalence classes and delta-discriminative emerging patterns

Authors:
Jinyan Li;Guimei Liu;Limsoon Wong
Affiliations:
National University of Singapore;National University of Singapore;National University of Singapore
Venue:
Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Year:
2007

Citing 22
Cited 22

Mining association rules between sets of items in large databases

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
A new framework for itemset generation

PODS '98 Proceedings of the seventeenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Efficient mining of emerging patterns: discovering trends and differences

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Mining frequent patterns without candidate generation

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Mining frequent patterns with counting inference

ACM SIGKDD Explorations Newsletter - Special issue on “Scalable data mining algorithms”
Empirical bayes screening for multi-item associations

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Detecting Group Differences: Mining Contrast Sets

Data Mining and Knowledge Discovery
The Space of Jumping Emerging Patterns and Its Incremental Maintenance Algorithms

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Selecting the right interestingness measure for association patterns

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Reducing multiclass to binary: a unifying approach for margin classifiers

The Journal of Machine Learning Research
CLOSET+: searching for the best strategies for mining frequent closed itemsets

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
On detecting differences between groups

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
On computing, storing and querying frequent patterns

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
In Defense of One-Vs-All Classification

The Journal of Machine Learning Research
Probability Estimates for Multi-class Classification by Pairwise Coupling

The Journal of Machine Learning Research
Relative risk and odds ratio: a data mining perspective

Proceedings of the twenty-fourth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Mining top-K covering rule groups for gene expression data

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Fast Algorithms for Frequent Itemset Mining Using FP-Trees

IEEE Transactions on Knowledge and Data Engineering
Fast Discovery and the Generalization of Strong Jumping Emerging Patterns for Building Compact and Accurate Classifiers

IEEE Transactions on Knowledge and Data Engineering
LCM ver.3: collaboration of array, bitmap and prefix tree for frequent itemset mining

Proceedings of the 1st international workshop on open source data mining: frequent pattern mining implementations
Fast mining of high dimensional expressive contrast patterns using zero-suppressed binary decision diagrams

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Minimum description length principle: generators are preferable to closed patterns

AAAI'06 Proceedings of the 21st national conference on Artificial intelligence - Volume 1

Adequate condensed representations of patterns

Data Mining and Knowledge Discovery
Efficient Mining of Jumping Emerging Patterns with Occurrence Counts for Classification

RSCTC '08 Proceedings of the 6th International Conference on Rough Sets and Current Trends in Computing
Efficient Discovery of Top-K Minimal Jumping Emerging Patterns

RSCTC '08 Proceedings of the 6th International Conference on Rough Sets and Current Trends in Computing
Mining Class Contrast Functions by Gene Expression Programming

ADMA '09 Proceedings of the 5th International Conference on Advanced Data Mining and Applications
Efficient itemset generator discovery over a stream sliding window

Proceedings of the 18th ACM conference on Information and knowledge management
Efficient incremental mining of contrast patterns in changing data

Information Processing Letters
Feature construction based on closedness properties is not that simple

PAKDD'08 Proceedings of the 12th Pacific-Asia conference on Advances in knowledge discovery and data mining
Local projection in jumping emerging patterns discovery in transaction databases

PAKDD'08 Proceedings of the 12th Pacific-Asia conference on Advances in knowledge discovery and data mining
On the relation between jumping emerging patterns and rough set theory with application to data classification

Transactions on rough sets XII
Mining contrast inequalities in numeric dataset

WAIM'10 Proceedings of the 11th international conference on Web-age information management
Rules for contrast sets

Intelligent Data Analysis
Efficient mining of jumping emerging patterns with occurrence counts for classification

Transactions on rough sets XIII
Using constraints to generate and explore higher order discriminative patterns

PAKDD'11 Proceedings of the 15th Pacific-Asia conference on Advances in knowledge discovery and data mining - Volume Part I
CPCQ: Contrast pattern based clustering quality index for categorical data

Pattern Recognition
Review: Situation identification techniques in pervasive computing: A review

Pervasive and Mobile Computing
A hierarchical approach to real-time activity recognition in body sensor networks

Pervasive and Mobile Computing
Clustering and understanding documents via discrimination information maximization

PAKDD'12 Proceedings of the 16th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part I
Optimization of criminal hotspots based on underlying crime controlling factors using geospatial discriminative pattern

IEA/AIE'12 Proceedings of the 25th international conference on Industrial Engineering and Other Applications of Applied Intelligent Systems: advanced research in applied artificial intelligence
Looking for a structural characterization of the sparseness measure of (frequent closed) itemset contexts

Information Sciences: an International Journal
Pattern-based real-time feedback for a temporal bone simulator

Proceedings of the 19th ACM Symposium on Virtual Reality Software and Technology
Crime hotspot mapping using the crime related factors--a spatial data mining approach

Applied Intelligence
Key roles of closed sets and minimal generators in concise representations of frequent patterns

Intelligent Data Analysis

Quantified Score

Hi-index	0.00

Visualization

Abstract

The support-confidence framework is the most common measure used in itemset mining algorithms, for its antimonotonicity that effectively simplifies the search lattice. This computational convenience brings both quality and statistical flaws to the results as observed by many previous studies. In this paper, we introduce a novel algorithm that produces itemsets with ranked statistical merits under sophisticated test statistics such as chi-square, risk ratio, odds ratio, etc. Our algorithm is based on the concept of equivalence classes. An equivalence class is a set of frequent itemsets that always occur together in the same set of transactions. Therefore, itemsets within an equivalence class all share the same level of statistical significance regardless of the variety of test statistics. As an equivalence class can be uniquely determined and concisely represented by a closed pattern and a set of generators, we just mine closed patterns and generators, taking a simultaneous depth-first search scheme. This parallel approach has not been exploited by any prior work. We evaluate our algorithm on two aspects. In general, we compare to LCM and FPclose which are the best algorithms tailored for mining only closed patterns. In particular, we compare to epMiner which is the most recent algorithm for mining a type of relative risk patterns, known as minimal emerging patterns. Experimental results show that our algorithm is faster than all of them, sometimes even multiple orders of magnitude faster. These statistically ranked patterns and the efficiency have a high potential for real-life applications, especially in biomedical and financial fields where classical test statistics are of dominant interest.