Efficiently mining long patterns from databases
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Detecting change in categorical data: mining contrast sets
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Detecting Group Differences: Mining Contrast Sets
Data Mining and Knowledge Discovery
Data Organization and Access for Efficient Data Mining
ICDE '99 Proceedings of the 15th International Conference on Data Engineering
Fast vertical mining using diffsets
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Data Mining: Concepts and Techniques
Data Mining: Concepts and Techniques
GenMax: An Efficient Algorithm for Mining Maximal Frequent Itemsets
Data Mining and Knowledge Discovery
TAPER: A Two-Step Approach for All-Strong-Pairs Correlation Query in Large Databases
IEEE Transactions on Knowledge and Data Engineering
Elements of Information Theory (Wiley Series in Telecommunications and Signal Processing)
Elements of Information Theory (Wiley Series in Telecommunications and Signal Processing)
Exploratory Quantitative Contrast Set Mining: A Discretization Approach
ICTAI '07 Proceedings of the 19th IEEE International Conference on Tools with Artificial Intelligence - Volume 02
Correlated pattern mining in quantitative databases
ACM Transactions on Database Systems (TODS)
Contrast Set Mining for Distinguishing Between Similar Diseases
AIME '07 Proceedings of the 11th conference on Artificial Intelligence in Medicine
Mining negative contrast sets from data with discrete attributes
Expert Systems with Applications: An International Journal
Group SAX: extending the notion of contrast sets to time series and multimedia data
PKDD'06 Proceedings of the 10th European conference on Principle and Practice of Knowledge Discovery in Databases
Hi-index | 0.00 |
Contrast set mining has developed as a data mining task which aims at discerning differences amongst groups. These groups can be patients, organizations, molecules, and even time-lines, and are defined by a selected property that distinguishes one from the other. A contrast set is a conjunction of attribute-value pairs that differ significantly in their distribution across groups. The search for contrast sets can be prohibitively expensive on relatively large datasets because every combination of attribute-values must be examined, causing a potential exponential growth of the search space. In this paper, we introduce the notion of a correlated group difference (CGD) and propose a contrast set mining technique that utilizes mutual information and all confidence to select the attribute-value pairs that are most highly correlated, in order to mine CGDs. Our experiments on real datasets demonstrate the efficiency of our approach and the interestingness of the CGDs discovered.