Feasible itemset distributions in data mining: theory and application

Authors:
Ganesh Ramesh;William A. Maniatty;Mohammed J. Zaki
Affiliations:
University at Albany, SUNY Albany, NY;University at Albany, SUNY Albany, NY;Rensselaer Polytechnic Institute, Troy, NY
Venue:
Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Year:
2003

Citing 13
Cited 16

Dynamic itemset counting and implication rules for market basket data

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Fast discovery of association rules

Advances in knowledge discovery and data mining
Data mining, hypergraph transversals, and machine learning (extended abstract)

PODS '97 Proceedings of the sixteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Efficiently mining long patterns from databases

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Mining frequent patterns without candidate generation

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Depth first generation of long patterns

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Data mining: concepts and techniques

Data mining: concepts and techniques
Real world performance of association rule algorithms

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Pincer Search: A New Algorithm for Discovering the Maximum Frequent Set

EDBT '98 Proceedings of the 6th International Conference on Extending Database Technology: Advances in Database Technology
MAFIA: A Maximal Frequent Itemset Algorithm for Transactional Databases

Proceedings of the 17th International Conference on Data Engineering
A Tight Upper Bound on the Number of Candidate Patterns

ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Efficiently Mining Maximal Frequent Itemsets

ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
An Efficient Algorithm for Mining Association Rules in Large Databases

VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases

Statistical properties of transactional databases

Proceedings of the 2004 ACM symposium on Applied computing
Support envelopes: a technique for exploring the structure of association patterns

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
The complexity of mining maximal frequent itemsets and maximal frequent patterns

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
A Thorough Experimental Study of Datasets for Frequent Itemsets

ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Approximate Inverse Frequent Itemset Mining: Privacy, Complexity, and Approximation

ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Computational aspects of mining maximal frequent patterns

Theoretical Computer Science
Power-law relationship and self-similarity in the itemset support distribution: analysis and applications

The VLDB Journal — The International Journal on Very Large Data Bases
An audit environment for outsourcing of frequent itemset mining

Proceedings of the VLDB Endowment
A new classification of datasets for frequent itemsets

Journal of Intelligent Information Systems
Towards bounding sequential patterns

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Discovery of probabilistic mappings between taxonomies: principles and experiments

Journal on data semantics XV
On exploring the power-law relationship in the itemset support distribution

EDBT'06 Proceedings of the 10th international conference on Advances in Database Technology
A further study on inverse frequent set mining

ADMA'05 Proceedings of the First international conference on Advanced Data Mining and Applications
Transaction databases, frequent itemsets, and their condensed representations

KDID'05 Proceedings of the 4th international conference on Knowledge Discovery in Inductive Databases
Count constraints and the inverse OLAP problem: definition, complexity and a step toward aggregate data exchange

FoIKS'12 Proceedings of the 7th international conference on Foundations of Information and Knowledge Systems
Solving inverse frequent itemset mining with infrequency constraints via large-scale linear programs

ACM Transactions on Knowledge Discovery from Data (TKDD)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Computing frequent itemsets and maximally frequent item-sets in a database are classic problems in data mining. The resource requirements of all extant algorithms for both problems depend on the distribution of frequent patterns, a topic that has not been formally investigated. In this paper, we study properties of length distributions of frequent and maximal frequent itemset collections and provide novel solutions for computing tight lower bounds for feasible distributions. We show how these bounding distributions can help in generating realistic synthetic datasets, which can be used for algorithm benchmarking.