Average Number of Frequent (Closed) Patterns in Bernouilli and Markovian Databases

Authors:
Francois Rioult;Arnaud Soulet
Affiliations:
Université de Caen Basse-Normandie;Université de Caen Basse-Normandie
Venue:
ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Year:
2005

Citing 8
Cited 5

Mining quantitative association rules in large relational tables

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Fast discovery of association rules

Advances in knowledge discovery and data mining
Data mining, hypergraph transversals, and machine learning (extended abstract)

PODS '97 Proceedings of the sixteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Generating non-redundant association rules

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Discovering All Most Specific Sentences by Randomized Algorithms

ICDT '97 Proceedings of the 6th International Conference on Database Theory
A Tight Upper Bound on the Number of Candidate Patterns

ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Sampling Large Databases for Association Rules

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases

Optimizing hypergraph transversal computation with an anti-monotone constraint

Proceedings of the 2007 ACM symposium on Applied computing
A Data Mining Formalization to Improve Hypergraph Minimal Transversal Computation

Fundamenta Informaticae
Estimating the number of frequent itemsets in a large database

Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
A Data Mining Formalization to Improve Hypergraph Minimal Transversal Computation

Fundamenta Informaticae
Looking for a structural characterization of the sparseness measure of (frequent closed) itemset contexts

Information Sciences: an International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

In data mining, enumerate the frequent or the closed patterns is often the first difficult task leading to the association rules discovery. The number of these patterns represents a great interest. The lower bound is known to be constant whereas the upper bound is exponential, but both situations correspond to pathological cases. For the first time, we give an average analysis of the number of frequent or closed patterns. Average analysis is often closer to real situations and gives more information about the role of the parameters. In this paper, two probabilistic models are studied: a BERNOULLI and a MARKOVIAN. In both models and for large databases, we prove that the number of frequent patterns, for a fixed frequency threshold , is exponential in the number of items and polynomial in the number of transactions. On the other hand, for a proportional frequency threshold , the number of frequent patterns is polynomial in the number of items and does not involve the number of transactions. Finally, we prove in the BERNOULLI model that the number of closed patterns, for a proportional frequency threshold, is polynomial in the number of items.