Automatic subspace clustering of high dimensional data for data mining applications
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Fast Algorithms for Mining Association Rules in Large Databases
VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Approximating a collection of frequent sets
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Support envelopes: a technique for exploring the structure of association patterns
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Biclustering Algorithms for Biological Data Analysis: A Survey
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
A general model for clustering binary data
Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Mining condensed frequent-pattern bases
Knowledge and Information Systems
On efficiently summarizing categorical databases
Knowledge and Information Systems
Turning Clusters into Patterns: Rectangle-Based Discriminative Data Description
ICDM '06 Proceedings of the Sixth International Conference on Data Mining
On data mining, compression, and Kolmogorov complexity
Data Mining and Knowledge Discovery
Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Summarization – compressing data into an informative representation
Knowledge and Information Systems
The generalized MDL approach for summarization
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Compressing large boolean matrices using reordering techniques
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Compression picks item sets that matter
PKDD'06 Proceedings of the 10th European conference on Principle and Practice of Knowledge Discovery in Databases
Cartesian contour: a concise representation for a collection of frequent sets
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
A Bipartite Graph Framework for Summarizing High-Dimensional Binary, Categorical and Numeric Data
SSDBM 2009 Proceedings of the 21st International Conference on Scientific and Statistical Database Management
Time sequence summarization to scale up chronology-dependent applications
Proceedings of the 18th ACM conference on Information and knowledge management
A generative pattern model for mining binary datasets
Proceedings of the 2010 ACM Symposium on Applied Computing
Block interaction: a generative summarization scheme for frequent patterns
Proceedings of the ACM SIGKDD Workshop on Useful Patterns
A new role mining framework to elicit business roles and to mitigate enterprise risk
Decision Support Systems
Krimp: mining itemsets that compress
Data Mining and Knowledge Discovery
Summarizing transactional databases with overlapped hyperrectangles
Data Mining and Knowledge Discovery
Transactional Database Transformation and Its Application in Prioritizing Human Disease Genes
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
A framework for summarizing and analyzing twitter feeds
Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Hi-index | 0.00 |
Transactional data are ubiquitous. Several methods, including frequent itemsets mining and co-clustering, have been proposed to analyze transactional databases. In this work, we propose a new research problem to succinctly summarize transactional databases. Solving this problem requires linking the high level structure of the database to a potentially huge number of frequent itemsets. We formulate this problem as a set covering problem using overlapped hyperrectangles; we then prove that this problem and its several variations are NP-hard. We develop an approximation algorithm HYPER which can achieve a ln(k) + 1 approximation ratio in polynomial time. We propose a pruning strategy that can significantly speed up the processing of our algorithm. Additionally, we propose an efficient algorithm to further summarize the set of hyperrectangles by allowing false positive conditions. A detailed study using both real and synthetic datasets shows the effectiveness and efficiency of our approaches in summarizing transactional databases.