Approximating a collection of frequent sets

Authors:
Foto Afrati;Aristides Gionis;Heikki Mannila
Affiliations:
University of Athens, Greece;University of Helsinki, Finland;University of Helsinki, Finland
Venue:
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Year:
2004

Citing 11
Cited 46

Mining association rules between sets of items in large databases

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Randomized algorithms

Randomized algorithms
Approximation algorithms for NP-hard problems

Approximation algorithms for NP-hard problems
KDD-Cup 2000 organizers' report: peeling the onion

ACM SIGKDD Explorations Newsletter - Special issue on “Scalable data mining algorithms”
Computers and Intractability: A Guide to the Theory of NP-Completeness

Computers and Intractability: A Guide to the Theory of NP-Completeness
Discovery of Frequent Episodes in Event Sequences

Data Mining and Knowledge Discovery
Mining Sequential Patterns

ICDE '95 Proceedings of the Eleventh International Conference on Data Engineering
Mining All Non-derivable Frequent Itemsets

PKDD '02 Proceedings of the 6th European Conference on Principles of Data Mining and Knowledge Discovery
Computing Iceberg Queries Efficiently

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
On Computing Condensed Frequent Pattern Bases

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Mining Top.K Frequent Closed Patterns without Minimum Support

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining

Summarizing itemset patterns: a profile-based approach

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Mining compressed frequent-pattern sets

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Summarization — Compressing Data into an Informative Representation

ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Generating semantic annotations for frequent patterns with context analysis

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Extracting redundancy-aware top-k patterns

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Summarizing itemset patterns using probabilistic models

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
On compressing frequent patterns

Data & Knowledge Engineering
The minimum consistent subset cover problem and its applications in data mining

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
From frequent itemsets to semantically meaningful visual patterns

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Summarization – compressing data into an informative representation

Knowledge and Information Systems
Efficient mining of understandable patterns from multivariate interval time series

Data Mining and Knowledge Discovery
Semantic annotation of frequent patterns

ACM Transactions on Knowledge Discovery from Data (TKDD)
Itemset frequency satisfiability: Complexity and axiomatization

Theoretical Computer Science
Mining top-k frequent patterns in the presence of the memory constraint

The VLDB Journal — The International Journal on Very Large Data Bases
Effective and efficient itemset pattern summarization: regression-based approaches

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Succinct summarization of transactional databases: an overlapped hyperrectangle scheme

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
On the Positive--Negative Partial Set Cover problem

Information Processing Letters
Efficient algorithms for incremental maintenance of closed sequential patterns in large databases

Data & Knowledge Engineering
Unary and n-ary inclusion dependency discovery in relational databases

Journal of Intelligent Information Systems
Mining non-derivable frequent itemsets over data stream

Data & Knowledge Engineering
Discovering Compatible Top-K Theme Patterns from Text Based on Users' Preferences

PAISI '09 Proceedings of the Pacific Asia Workshop on Intelligence and Security Informatics
Cartesian contour: a concise representation for a collection of frequent sets

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
CP-summary: a concise representation for browsing frequent itemsets

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
A Bipartite Graph Framework for Summarizing High-Dimensional Binary, Categorical and Numeric Data

SSDBM 2009 Proceedings of the 21st International Conference on Scientific and Statistical Database Management
Mining Compressed Repetitive Gapped Sequential Patterns Efficiently

ADMA '09 Proceedings of the 5th International Conference on Advanced Data Mining and Applications
On approximating minimum infrequent and maximum frequent sets

DS'07 Proceedings of the 10th international conference on Discovery science
Mining representative subspace clusters in high-dimensional data

FSKD'09 Proceedings of the 6th international conference on Fuzzy systems and knowledge discovery - Volume 1
Margin-closed frequent sequential pattern mining

Proceedings of the ACM SIGKDD Workshop on Useful Patterns
Block interaction: a generative summarization scheme for frequent patterns

Proceedings of the ACM SIGKDD Workshop on Useful Patterns
Cube based summaries of large association rule sets

ADMA'10 Proceedings of the 6th international conference on Advanced data mining and applications: Part I
Summarizing transactional databases with overlapped hyperrectangles

Data Mining and Knowledge Discovery
Summarizing frequent itemsets via pignistic transformation

EPIA'11 Proceedings of the 15th Portugese conference on Progress in artificial intelligence
Visualizing transactional data with multiple clusterings for knowledge discovery

ISMIS'06 Proceedings of the 16th international conference on Foundations of Intelligent Systems
Summarizing frequent patterns using profiles

DASFAA'06 Proceedings of the 11th international conference on Database Systems for Advanced Applications
Mining compressed sequential patterns

ADMA'06 Proceedings of the Second international conference on Advanced Data Mining and Applications
The parameterized complexity of enumerating frequent itemsets

IWPEC'06 Proceedings of the Second international conference on Parameterized and Exact Computation
On approximation algorithms for data mining applications

Efficient Approximation and Online Algorithms
Transaction databases, frequent itemsets, and their condensed representations

KDID'05 Proceedings of the 4th international conference on Knowledge Discovery in Inductive Databases
A false negative maximal frequent itemset mining algorithm over stream

ADMA'11 Proceedings of the 7th international conference on Advanced Data Mining and Applications - Volume Part I
Finding minimum representative pattern sets

Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
Estimating entity importance via counting set covers

Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
Using Patterns Co-occurrence Matrix for Cleaning Closed Sequential Patterns for Text Mining

WI-IAT '12 Proceedings of the The 2012 IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technology - Volume 01
Redundancy-aware maximal cliques

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
Randomly sampling maximal itemsets

Proceedings of the ACM SIGKDD Workshop on Interactive Data Exploration and Analytics
TSum: fast, principled table summarization

Proceedings of the Seventh International Workshop on Data Mining for Online Advertising
Anytime algorithms for mining groups with maximum coverage

AusDM '12 Proceedings of the Tenth Australasian Data Mining Conference - Volume 134

Quantified Score

Hi-index	0.00

Visualization

Abstract

One of the most well-studied problems in data mining is computing the collection of frequent item sets in large transactional databases. One obstacle for the applicability of frequent-set mining is that the size of the output collection can be far too large to be carefully examined and understood by the users. Even restricting the output to the border of the frequent item-set collection does not help much in alleviating the problem.In this paper we address the issue of overwhelmingly large output size by introducing and studying the following problem: What are the k sets that best approximate a collection of frequent item sets? Our measure of approximating a collection of sets by k sets is defined to be the size of the collection covered by the the k sets, i.e., the part of the collection that is included in one of the k sets. We also specify a bound on the number of extra sets that are allowed to be covered. We examine different problem variants for which we demonstrate the hardness of the corresponding problems and we provide simple polynomial-time approximation algorithms. We give empirical evidence showing that the approximation methods work well in practice.