Parallel tree-projection-based sequence mining algorithms

Authors:
Valerie Guralnik;George Karypis
Affiliations:
Department of Computer Science and Engineering, Digital Technology Center, and Army HPC Research Center, University of Minnesota, Minneapolis, MN;Department of Computer Science and Engineering, Digital Technology Center, and Army HPC Research Center, University of Minnesota, Minneapolis, MN
Venue:
Parallel Computing
Year:
2004

Citing 26
Cited 11

Introduction to algorithms

Introduction to algorithms
Analyzing scalability of parallel algorithms and architectures

Journal of Parallel and Distributed Computing - Special issue on scalability of parallel algorithms and architectures
Efficient parallel data mining for association rules

CIKM '95 Proceedings of the fourth international conference on Information and knowledge management
Fast sequential and parallel algorithms for association rule mining: a comparison

Fast sequential and parallel algorithms for association rule mining: a comparison
An effective hash-based algorithm for mining association rules

SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Beyond market baskets: generalizing association rules to correlations

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Efficient enumeration of frequent sequences

Proceedings of the seventh international conference on Information and knowledge management
Mining frequent patterns without candidate generation

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
A tree projection algorithm for generation of frequent item sets

Journal of Parallel and Distributed Computing - Special issue on high-performance data mining
Parallel sequence mining on shared-memory machines

Journal of Parallel and Distributed Computing - Special issue on high-performance data mining
Hash based parallel algorithms for mining association rules

DIS '96 Proceedings of the fourth international conference on on Parallel and distributed information systems
Multilevel algorithms for multi-constraint graph partitioning

SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
Parallel data mining for association rules on shared memory systems

Knowledge and Information Systems
SPADE: An Efficient Algorithm for Mining Frequent Sequences

Machine Learning
Parallel and Distributed Association Mining: A Survey

IEEE Concurrency
Parallel Mining of Association Rules

IEEE Transactions on Knowledge and Data Engineering
Scalable Parallel Data Mining for Association Rules

IEEE Transactions on Knowledge and Data Engineering
Scalable Algorithms for Association Mining

IEEE Transactions on Knowledge and Data Engineering
Mining Sequential Patterns: Generalizations and Performance Improvements

EDBT '96 Proceedings of the 5th International Conference on Extending Database Technology: Advances in Database Technology
PrefixSpan: Mining Sequential Patterns by Prefix-Projected Growth

Proceedings of the 17th International Conference on Data Engineering
Fast Parallel Association Rule Mining without Candidacy Generation

ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
LPMiner: An Algorithm for Finding Frequent Itemsets Using Length-Decreasing Support Constraint

ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
An Efficient Algorithm for Mining Association Rules in Large Databases

VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Mining Algorithms for Sequential Patterns in Parallel: Hash Based Approach

PAKDD '98 Proceedings of the Second Pacific-Asia Conference on Research and Development in Knowledge Discovery and Data Mining
SLPMiner: An Algorithm for Finding Frequent Sequential Patterns Using Length-Decreasing Support Constraint

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining

A sampling-based framework for parallel data mining

Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming
Parallel mining of closed sequential patterns

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
A tree-projection-based algorithm for multi-label recurrent-item associative-classification rule generation

Data & Knowledge Engineering
Mining sequential patterns across multiple sequence databases

Data & Knowledge Engineering
A new dynamic load balancing technique for parallel modified PrefixSpan with distributed worker paradigm and its performance evaluation

ISHPC'05/ALPS'06 Proceedings of the 6th international symposium on high-performance computing and 1st international conference on Advanced low power systems
Parallel exact time series motif discovery

Euro-Par'10 Proceedings of the 16th international Euro-Par conference on Parallel processing: Part II
An empirical study on mining sequential patterns in a grid computing environment

Expert Systems with Applications: An International Journal
BIDE-Based parallel mining of frequent closed sequences with mapreduce

ICA3PP'12 Proceedings of the 12th international conference on Algorithms and Architectures for Parallel Processing - Volume Part II
Sequential pattern mining -- approaches and algorithms

ACM Computing Surveys (CSUR)
Computing n-gram statistics in MapReduce

Proceedings of the 16th International Conference on Extending Database Technology
Mind the gap: large-scale frequent sequence mining

Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data

Quantified Score

Hi-index	0.00

Visualization

Abstract

Discovery of sequential patterns is becoming increasingly useful and essential in many scientific and commercial domains. Enormous sizes of available datasets and possibly large number of mined patterns demand efficient, scalable, and parallel algorithms. Even though a number of algorithms have been developed to efficiently parallelize frequent pattern discovery algorithms that are based on the candidate-generation-and-counting framework, the problem of parallelizing the more efficient projection-based algorithms has received relatively little attention and existing parallel formulations have been targeted only toward shared-memory architectures. The irregular and unstructured nature of the task-graph generated by these algorithms and the fact that these tasks operate on overlapping sub-databases makes it challenging to efficiently parallelize these algorithms on scalable distributed-memory parallel computing architectures. In this paper we present and study a variety of distributed-memory parallel algorithms for a tree-projection-based frequent sequence discovery algorithm that are able to minimize the various overheads associated with load imbalance, database overlap, and interprocessor communication. Our experimental evaluation on a 32 processor IBM SP show that these algorithms are capable of achieving good speedups, substantially reducing the amount of the required work to find sequential patterns in large databases.