Introduction to algorithms
Analyzing scalability of parallel algorithms and architectures
Journal of Parallel and Distributed Computing - Special issue on scalability of parallel algorithms and architectures
Efficient parallel data mining for association rules
CIKM '95 Proceedings of the fourth international conference on Information and knowledge management
Fast sequential and parallel algorithms for association rule mining: a comparison
Fast sequential and parallel algorithms for association rule mining: a comparison
An effective hash-based algorithm for mining association rules
SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Beyond market baskets: generalizing association rules to correlations
SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Efficient enumeration of frequent sequences
Proceedings of the seventh international conference on Information and knowledge management
Mining frequent patterns without candidate generation
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
A tree projection algorithm for generation of frequent item sets
Journal of Parallel and Distributed Computing - Special issue on high-performance data mining
Parallel sequence mining on shared-memory machines
Journal of Parallel and Distributed Computing - Special issue on high-performance data mining
Hash based parallel algorithms for mining association rules
DIS '96 Proceedings of the fourth international conference on on Parallel and distributed information systems
Multilevel algorithms for multi-constraint graph partitioning
SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
Parallel data mining for association rules on shared memory systems
Knowledge and Information Systems
SPADE: An Efficient Algorithm for Mining Frequent Sequences
Machine Learning
Parallel and Distributed Association Mining: A Survey
IEEE Concurrency
Parallel Mining of Association Rules
IEEE Transactions on Knowledge and Data Engineering
Scalable Parallel Data Mining for Association Rules
IEEE Transactions on Knowledge and Data Engineering
Scalable Algorithms for Association Mining
IEEE Transactions on Knowledge and Data Engineering
Mining Sequential Patterns: Generalizations and Performance Improvements
EDBT '96 Proceedings of the 5th International Conference on Extending Database Technology: Advances in Database Technology
PrefixSpan: Mining Sequential Patterns by Prefix-Projected Growth
Proceedings of the 17th International Conference on Data Engineering
Fast Parallel Association Rule Mining without Candidacy Generation
ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
LPMiner: An Algorithm for Finding Frequent Itemsets Using Length-Decreasing Support Constraint
ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Fast Algorithms for Mining Association Rules in Large Databases
VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
An Efficient Algorithm for Mining Association Rules in Large Databases
VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Mining Algorithms for Sequential Patterns in Parallel: Hash Based Approach
PAKDD '98 Proceedings of the Second Pacific-Asia Conference on Research and Development in Knowledge Discovery and Data Mining
ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
A sampling-based framework for parallel data mining
Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming
Parallel mining of closed sequential patterns
Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Data & Knowledge Engineering
Mining sequential patterns across multiple sequence databases
Data & Knowledge Engineering
ISHPC'05/ALPS'06 Proceedings of the 6th international symposium on high-performance computing and 1st international conference on Advanced low power systems
Parallel exact time series motif discovery
Euro-Par'10 Proceedings of the 16th international Euro-Par conference on Parallel processing: Part II
An empirical study on mining sequential patterns in a grid computing environment
Expert Systems with Applications: An International Journal
BIDE-Based parallel mining of frequent closed sequences with mapreduce
ICA3PP'12 Proceedings of the 12th international conference on Algorithms and Architectures for Parallel Processing - Volume Part II
Sequential pattern mining -- approaches and algorithms
ACM Computing Surveys (CSUR)
Computing n-gram statistics in MapReduce
Proceedings of the 16th International Conference on Extending Database Technology
Mind the gap: large-scale frequent sequence mining
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Hi-index | 0.00 |
Discovery of sequential patterns is becoming increasingly useful and essential in many scientific and commercial domains. Enormous sizes of available datasets and possibly large number of mined patterns demand efficient, scalable, and parallel algorithms. Even though a number of algorithms have been developed to efficiently parallelize frequent pattern discovery algorithms that are based on the candidate-generation-and-counting framework, the problem of parallelizing the more efficient projection-based algorithms has received relatively little attention and existing parallel formulations have been targeted only toward shared-memory architectures. The irregular and unstructured nature of the task-graph generated by these algorithms and the fact that these tasks operate on overlapping sub-databases makes it challenging to efficiently parallelize these algorithms on scalable distributed-memory parallel computing architectures. In this paper we present and study a variety of distributed-memory parallel algorithms for a tree-projection-based frequent sequence discovery algorithm that are able to minimize the various overheads associated with load imbalance, database overlap, and interprocessor communication. Our experimental evaluation on a 32 processor IBM SP show that these algorithms are capable of achieving good speedups, substantially reducing the amount of the required work to find sequential patterns in large databases.