A "Go With the Winners" approach to finding frequent patterns

Authors:
Todd Ebert;Darin Goldstein
Affiliations:
California State University, Long Beach;California State University, Long Beach
Venue:
Proceedings of the 2005 ACM symposium on Applied computing
Year:
2005

Citing 8
Cited 0

Mining association rules between sets of items in large databases

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Fast discovery of association rules

Advances in knowledge discovery and data mining
Mersenne twister: a 623-dimensionally equidistributed uniform pseudo-random number generator

ACM Transactions on Modeling and Computer Simulation (TOMACS) - Special issue on uniform random number generation
Efficiently mining long patterns from databases

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Discovering All Most Specific Sentences by Randomized Algorithms

ICDT '97 Proceedings of the 6th International Conference on Database Theory
On the Complexity of Generating Maximal Frequent and Minimal Infrequent Sets

STACS '02 Proceedings of the 19th Annual Symposium on Theoretical Aspects of Computer Science
Discovering all most specific sentences

ACM Transactions on Database Systems (TODS)
An inequality limiting the number of maximal frequent sets

An inequality limiting the number of maximal frequent sets

Quantified Score

Hi-index	0.00

Visualization

Abstract

In their seminal work on Go With the Winners (GWW) algorithms, D. Aldous and U. Vazirani [3] proved a sufficient condition for the number of particles needed for reaching the bottom of a tree with high probability via a GWW random walk. However, to use this result in practice would require knowledge of the entire search tree which is infeasible for most problems. In this paper we improve slightly on this situation by deriving a recurrence relation that provides an upper-bound for a tree's imbalance in terms of the imbalance between tree levels that are close to one another, provided that these latter imbalances can be measured with sufficient accuracy.We then turn our attention to the problem of finding both frequent and infrequent patterns in a database. One of the most widely used algorithms for finding frequent patterns in memory-resident databases is a randomized algorithm first proposed by Gunopulos et al. [12]. We show that such an algorithm is precisely one for which the GWW paradigm was designed to improve on. Experimental results using the Splice-junction Gene Sequences Database [4] are also provided and lend empirical evidence of the benefits of using GWW.