Succinct summarization of transactional databases: an overlapped hyperrectangle scheme

Authors:
Yang Xiang;Ruoming Jin;David Fuhry;Feodor F. Dragan
Affiliations:
Kent State University, Kent, OH, USA;Kent State University, Kent, OH, USA;Kent State University, Kent, OH, USA;Kent State University, Kent, OH, USA
Venue:
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Year:
2008

Citing 15
Cited 11

Automatic subspace clustering of high dimensional data for data mining applications

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Approximating a collection of frequent sets

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Support envelopes: a technique for exploring the structure of association patterns

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Biclustering Algorithms for Biological Data Analysis: A Survey

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
A general model for clustering binary data

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Mining condensed frequent-pattern bases

Knowledge and Information Systems
On efficiently summarizing categorical databases

Knowledge and Information Systems
Turning Clusters into Patterns: Rectangle-Based Discriminative Data Description

ICDM '06 Proceedings of the Sixth International Conference on Data Mining
On data mining, compression, and Kolmogorov complexity

Data Mining and Knowledge Discovery
Characterising the difference

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Summarization – compressing data into an informative representation

Knowledge and Information Systems
The generalized MDL approach for summarization

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Compressing large boolean matrices using reordering techniques

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Compression picks item sets that matter

PKDD'06 Proceedings of the 10th European conference on Principle and Practice of Knowledge Discovery in Databases

Cartesian contour: a concise representation for a collection of frequent sets

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
A Bipartite Graph Framework for Summarizing High-Dimensional Binary, Categorical and Numeric Data

SSDBM 2009 Proceedings of the 21st International Conference on Scientific and Statistical Database Management
Time sequence summarization to scale up chronology-dependent applications

Proceedings of the 18th ACM conference on Information and knowledge management
A generative pattern model for mining binary datasets

Proceedings of the 2010 ACM Symposium on Applied Computing
Block interaction: a generative summarization scheme for frequent patterns

Proceedings of the ACM SIGKDD Workshop on Useful Patterns
A new role mining framework to elicit business roles and to mitigate enterprise risk

Decision Support Systems
Krimp: mining itemsets that compress

Data Mining and Knowledge Discovery
Summarizing transactional databases with overlapped hyperrectangles

Data Mining and Knowledge Discovery
Transactional Database Transformation and Its Application in Prioritizing Human Disease Genes

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
A framework for summarizing and analyzing twitter feeds

Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
Mining Minimal Motif Pair Sets Maximally Covering Interactions in a Protein-Protein Interaction Network

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Transactional data are ubiquitous. Several methods, including frequent itemsets mining and co-clustering, have been proposed to analyze transactional databases. In this work, we propose a new research problem to succinctly summarize transactional databases. Solving this problem requires linking the high level structure of the database to a potentially huge number of frequent itemsets. We formulate this problem as a set covering problem using overlapped hyperrectangles; we then prove that this problem and its several variations are NP-hard. We develop an approximation algorithm HYPER which can achieve a ln(k) + 1 approximation ratio in polynomial time. We propose a pruning strategy that can significantly speed up the processing of our algorithm. Additionally, we propose an efficient algorithm to further summarize the set of hyperrectangles by allowing false positive conditions. A detailed study using both real and synthetic datasets shows the effectiveness and efficiency of our approaches in summarizing transactional databases.