Dense itemsets

Authors:
Jouni K. Seppänen;Heikki Mannila
Affiliations:
Helsinki University of Technology, Finland;Helsinki University of Technology, Finland
Venue:
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Year:
2004

Citing 11
Cited 19

Mining association rules between sets of items in large databases

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Fast discovery of association rules

Advances in knowledge discovery and data mining
Using association rules for product assortment decisions: a case study

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
The budgeted maximum coverage problem

Information Processing Letters
Efficient discovery of error-tolerant frequent itemsets in high dimensions

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Mining All Non-derivable Frequent Itemsets

PKDD '02 Proceedings of the 6th European Conference on Principles of Data Mining and Knowledge Discovery
Topics in 0--1 data

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Mining Top.K Frequent Closed Patterns without Minimum Support

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
TSP: Mining Top-K Closed Sequential Patterns

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Pattern Classification (2nd Edition)

Pattern Classification (2nd Edition)
Segmentation problems

Journal of the ACM (JACM)

Mining Approximate Frequent Itemsets from Noisy Data

ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Comparing Subspace Clusterings

IEEE Transactions on Knowledge and Data Engineering
Efficient mining of understandable patterns from multivariate interval time series

Data Mining and Knowledge Discovery
Quantitative evaluation of approximate frequent pattern mining algorithms

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Capturing truthiness: mining truth tables in binary datasets

Proceedings of the 2009 ACM symposium on Applied Computing
Association Analysis Techniques for Bioinformatics Problems

BICoB '09 Proceedings of the 1st International Conference on Bioinformatics and Computational Biology
Towards efficient mining of proportional fault-tolerant frequent itemsets

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
An efficient polynomial delay algorithm for pseudo frequent itemset mining

DS'07 Proceedings of the 10th international conference on Discovery science
Efficient incremental mining of top-K frequent closed itemsets

DS'07 Proceedings of the 10th international conference on Discovery science
Ambiguous frequent itemset mining and polynomial delay enumeration

PAKDD'08 Proceedings of the 12th Pacific-Asia conference on Advances in knowledge discovery and data mining
Actionability and formal concepts: a data mining perspective

ICFCA'08 Proceedings of the 6th international conference on Formal concept analysis
Mining fault-tolerant item sets using subset size occurrence distributions

IDA'11 Proceedings of the 10th international conference on Advances in intelligent data analysis X
Finding trees from unordered 0–1 data

PKDD'06 Proceedings of the 10th European conference on Principle and Practice of Knowledge Discovery in Databases
Significance and recovery of block structures in binary matrices with noise

COLT'06 Proceedings of the 19th annual conference on Learning Theory
Mining a new fault-tolerant pattern type as an alternative to formal concept discovery

ICCS'06 Proceedings of the 14th international conference on Conceptual Structures: inspiration and Application
Constraint-Based mining of fault-tolerant patterns from boolean data

KDID'05 Proceedings of the 4th international conference on Knowledge Discovery in Inductive Databases
Frequent item set mining

Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery
A knowledge-driven bi-clustering method for mining noisy datasets

ICONIP'12 Proceedings of the 19th international conference on Neural Information Processing - Volume Part III
Closed and noise-tolerant patterns in n-ary relations

Data Mining and Knowledge Discovery

Quantified Score

Hi-index	0.00

Visualization

Abstract

Frequent itemset mining has been the subject of a lot of work in data mining research ever since association rules were introduced. In this paper we address a problem with frequent itemsets: that they only count rows where all their attributes are present, and do not allow for any noise. We show that generalizing the concept of frequency while preserving the performance of mining algorithms is nontrivial, and introduce a generalization of frequent itemsets, dense itemsets. Dense itemsets do not require all attributes to be present at the same time; instead, the itemset needs to define a sufficiently large submatrix that exceeds a given density threshold of attributes present.We consider the problem of computing all dense itemsets in a database. We give a levelwise algorithm for this problem, and also study the top-$k$ variations, i.e., finding the k densest sets with a given support, or the k best-supported sets with a given density. These algorithms select the other parameter automatically, which simplifies mining dense itemsets in an explorative way. We show that the concept captures natural facets of data sets, and give extensive empirical results on the performance of the algorithms. Combining the concept of dense itemsets with set cover ideas, we also show that dense itemsets can be used to obtain succinct descriptions of large datasets. We also discuss some variations of dense itemsets.