Efficient management of transitive relationships in large data and knowledge bases
SIGMOD '89 Proceedings of the 1989 ACM SIGMOD international conference on Management of data
Mining association rules between sets of items in large databases
SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Fast discovery of association rules
Advances in knowledge discovery and data mining
Automatic subspace clustering of high dimensional data for data mining applications
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Free-Sets: A Condensed Representation of Boolean Data for the Approximation of Frequency Queries
Data Mining and Knowledge Discovery
Fast Algorithms for Mining Association Rules in Large Databases
VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
The maximum edge biclique problem is NP-complete
Discrete Applied Mathematics
Approximating a collection of frequent sets
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Support envelopes: a technique for exploring the structure of association patterns
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Biclustering Algorithms for Biological Data Analysis: A Survey
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Geometric and combinatorial tiles in 0-1 data
PKDD '04 Proceedings of the 8th European Conference on Principles and Practice of Knowledge Discovery in Databases
Data Mining: Concepts and Techniques
Data Mining: Concepts and Techniques
A general model for clustering binary data
Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Mining compressed frequent-pattern sets
VLDB '05 Proceedings of the 31st international conference on Very large data bases
Mining condensed frequent-pattern bases
Knowledge and Information Systems
MAFIA: A Maximal Frequent Itemset Algorithm
IEEE Transactions on Knowledge and Data Engineering
On efficiently summarizing categorical databases
Knowledge and Information Systems
Graph minimum linear arrangement by multilevel weighted edge contractions
Journal of Algorithms
Turning Clusters into Patterns: Rectangle-Based Discriminative Data Description
ICDM '06 Proceedings of the Sixth International Conference on Data Mining
Data Mining and Knowledge Discovery
On data mining, compression, and Kolmogorov complexity
Data Mining and Knowledge Discovery
The minimum consistent subset cover problem and its applications in data mining
Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Summarization – compressing data into an informative representation
Knowledge and Information Systems
The generalized MDL approach for summarization
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Compressing large boolean matrices using reordering techniques
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Succinct summarization of transactional databases: an overlapped hyperrectangle scheme
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Overlapping Matrix Pattern Visualization: A Hypergraph Approach
ICDM '08 Proceedings of the 2008 Eighth IEEE International Conference on Data Mining
Cartesian contour: a concise representation for a collection of frequent sets
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Mining bi-sets in numerical data
KDID'06 Proceedings of the 5th international conference on Knowledge discovery in inductive databases
Modern Coding Theory
Compression picks item sets that matter
PKDD'06 Proceedings of the 10th European conference on Principle and Practice of Knowledge Discovery in Databases
Comparing apples and oranges: measuring differences between data mining results
ECML PKDD'11 Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part III
Data summarization for network traffic monitoring
Journal of Network and Computer Applications
Hi-index | 0.00 |
Transactional data are ubiquitous. Several methods, including frequent itemset mining and co-clustering, have been proposed to analyze transactional databases. In this work, we propose a new research problem to succinctly summarize transactional databases. Solving this problem requires linking the high level structure of the database to a potentially huge number of frequent itemsets. We formulate this problem as a set covering problem using overlapped hyperrectangles (a concept generally regarded as tile according to some existing papers); we then prove that this problem and its several variations are NP-hard, and we further reveal its relationship with the compact representation of a directed bipartite graph. We develop an approximation algorithm Hyper which can achieve a logarithmic approximation ratio in polynomial time. We propose a pruning strategy that can significantly speed up the processing of our algorithm, and we also propose an efficient algorithm Hyper+ to further summarize the set of hyperrectangles by allowing false positive conditions. Additionally, we show that hyperrectangles generated by our algorithms can be properly visualized. A detailed study using both real and synthetic datasets shows the effectiveness and efficiency of our approaches in summarizing transactional databases.