Parallel frequent set counting

Authors:
D. B. Skillicorn
Affiliations:
Department of Computing and Information Science, Queen's University, Kingston, Canada
Venue:
Parallel Computing - Parallel data-intensive algorithms and applications
Year:
2002

Citing 9
Cited 2

Bagging predictors

Machine Learning
Dynamic itemset counting and implication rules for market basket data

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Scalable parallel data mining for association rules

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
New sampling-based summary statistics for improving approximate query answers

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Mining features for sequence classification

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Mining frequent patterns by pattern-growth: methodology and implications

ACM SIGKDD Explorations Newsletter - Special issue on “Scalable data mining algorithms”
Mining Sequential Patterns

ICDE '95 Proceedings of the Eleventh International Conference on Data Engineering
PrefixSpan: Mining Sequential Patterns by Prefix-Projected Growth

Proceedings of the 17th International Conference on Data Engineering
Mining Generalized Association Rules

VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases

Robust and distributed top-n frequent-pattern mining with SAP BW accelerator

Proceedings of the VLDB Endowment
Rule synthesizing from multiple related databases

PAKDD'10 Proceedings of the 14th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part II

Quantified Score

Hi-index	0.00

Visualization

Abstract

Computing the frequent subsets of large multi-attribute data is a key component of local pattern detection data mining algorithms. It is both computation- and data-intensive. The standard parallel algorithms require multiple passes through the data. The cost of data access may easily outweigh any performance gained by parallelizing the computational part. We address two opportunities for performance improvement: using a parallel approximate algorithm that requires only a single pass over the data; and using a probabilistic technique to avoid generating most of the lattice of subsets implied by each object's data. The computation required is only slightly greater than levelwise algorithms, but the amount of data access is much smaller.