Mining association rules between sets of items in large databases
SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Randomized algorithms
Approximation algorithms for NP-hard problems
Approximation algorithms for NP-hard problems
KDD-Cup 2000 organizers' report: peeling the onion
ACM SIGKDD Explorations Newsletter - Special issue on “Scalable data mining algorithms”
Computers and Intractability: A Guide to the Theory of NP-Completeness
Computers and Intractability: A Guide to the Theory of NP-Completeness
Discovery of Frequent Episodes in Event Sequences
Data Mining and Knowledge Discovery
ICDE '95 Proceedings of the Eleventh International Conference on Data Engineering
Mining All Non-derivable Frequent Itemsets
PKDD '02 Proceedings of the 6th European Conference on Principles of Data Mining and Knowledge Discovery
Computing Iceberg Queries Efficiently
VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
On Computing Condensed Frequent Pattern Bases
ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Mining Top.K Frequent Closed Patterns without Minimum Support
ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Summarizing itemset patterns: a profile-based approach
Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Mining compressed frequent-pattern sets
VLDB '05 Proceedings of the 31st international conference on Very large data bases
Summarization — Compressing Data into an Informative Representation
ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Generating semantic annotations for frequent patterns with context analysis
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Extracting redundancy-aware top-k patterns
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Summarizing itemset patterns using probabilistic models
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
On compressing frequent patterns
Data & Knowledge Engineering
The minimum consistent subset cover problem and its applications in data mining
Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
From frequent itemsets to semantically meaningful visual patterns
Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Summarization – compressing data into an informative representation
Knowledge and Information Systems
Efficient mining of understandable patterns from multivariate interval time series
Data Mining and Knowledge Discovery
Semantic annotation of frequent patterns
ACM Transactions on Knowledge Discovery from Data (TKDD)
Itemset frequency satisfiability: Complexity and axiomatization
Theoretical Computer Science
Mining top-k frequent patterns in the presence of the memory constraint
The VLDB Journal — The International Journal on Very Large Data Bases
Effective and efficient itemset pattern summarization: regression-based approaches
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Succinct summarization of transactional databases: an overlapped hyperrectangle scheme
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
On the Positive--Negative Partial Set Cover problem
Information Processing Letters
Efficient algorithms for incremental maintenance of closed sequential patterns in large databases
Data & Knowledge Engineering
Unary and n-ary inclusion dependency discovery in relational databases
Journal of Intelligent Information Systems
Mining non-derivable frequent itemsets over data stream
Data & Knowledge Engineering
Discovering Compatible Top-K Theme Patterns from Text Based on Users' Preferences
PAISI '09 Proceedings of the Pacific Asia Workshop on Intelligence and Security Informatics
Cartesian contour: a concise representation for a collection of frequent sets
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
CP-summary: a concise representation for browsing frequent itemsets
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
A Bipartite Graph Framework for Summarizing High-Dimensional Binary, Categorical and Numeric Data
SSDBM 2009 Proceedings of the 21st International Conference on Scientific and Statistical Database Management
Mining Compressed Repetitive Gapped Sequential Patterns Efficiently
ADMA '09 Proceedings of the 5th International Conference on Advanced Data Mining and Applications
On approximating minimum infrequent and maximum frequent sets
DS'07 Proceedings of the 10th international conference on Discovery science
Mining representative subspace clusters in high-dimensional data
FSKD'09 Proceedings of the 6th international conference on Fuzzy systems and knowledge discovery - Volume 1
Margin-closed frequent sequential pattern mining
Proceedings of the ACM SIGKDD Workshop on Useful Patterns
Block interaction: a generative summarization scheme for frequent patterns
Proceedings of the ACM SIGKDD Workshop on Useful Patterns
Cube based summaries of large association rule sets
ADMA'10 Proceedings of the 6th international conference on Advanced data mining and applications: Part I
Summarizing transactional databases with overlapped hyperrectangles
Data Mining and Knowledge Discovery
Summarizing frequent itemsets via pignistic transformation
EPIA'11 Proceedings of the 15th Portugese conference on Progress in artificial intelligence
Visualizing transactional data with multiple clusterings for knowledge discovery
ISMIS'06 Proceedings of the 16th international conference on Foundations of Intelligent Systems
Summarizing frequent patterns using profiles
DASFAA'06 Proceedings of the 11th international conference on Database Systems for Advanced Applications
Mining compressed sequential patterns
ADMA'06 Proceedings of the Second international conference on Advanced Data Mining and Applications
The parameterized complexity of enumerating frequent itemsets
IWPEC'06 Proceedings of the Second international conference on Parameterized and Exact Computation
On approximation algorithms for data mining applications
Efficient Approximation and Online Algorithms
Transaction databases, frequent itemsets, and their condensed representations
KDID'05 Proceedings of the 4th international conference on Knowledge Discovery in Inductive Databases
A false negative maximal frequent itemset mining algorithm over stream
ADMA'11 Proceedings of the 7th international conference on Advanced Data Mining and Applications - Volume Part I
Finding minimum representative pattern sets
Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
Estimating entity importance via counting set covers
Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
Using Patterns Co-occurrence Matrix for Cleaning Closed Sequential Patterns for Text Mining
WI-IAT '12 Proceedings of the The 2012 IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technology - Volume 01
Redundancy-aware maximal cliques
Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
Randomly sampling maximal itemsets
Proceedings of the ACM SIGKDD Workshop on Interactive Data Exploration and Analytics
TSum: fast, principled table summarization
Proceedings of the Seventh International Workshop on Data Mining for Online Advertising
Anytime algorithms for mining groups with maximum coverage
AusDM '12 Proceedings of the Tenth Australasian Data Mining Conference - Volume 134
Hi-index | 0.00 |
One of the most well-studied problems in data mining is computing the collection of frequent item sets in large transactional databases. One obstacle for the applicability of frequent-set mining is that the size of the output collection can be far too large to be carefully examined and understood by the users. Even restricting the output to the border of the frequent item-set collection does not help much in alleviating the problem.In this paper we address the issue of overwhelmingly large output size by introducing and studying the following problem: What are the k sets that best approximate a collection of frequent item sets? Our measure of approximating a collection of sets by k sets is defined to be the size of the collection covered by the the k sets, i.e., the part of the collection that is included in one of the k sets. We also specify a bound on the number of extra sets that are allowed to be covered. We examine different problem variants for which we demonstrate the hardness of the corresponding problems and we provide simple polynomial-time approximation algorithms. We give empirical evidence showing that the approximation methods work well in practice.