Parallel sequence mining on shared-memory machines
Journal of Parallel and Distributed Computing - Special issue on high-performance data mining
ICDE '95 Proceedings of the Eleventh International Conference on Data Engineering
BIDE: Efficient Mining of Frequent Closed Sequences
ICDE '04 Proceedings of the 20th International Conference on Data Engineering
Parallel tree-projection-based sequence mining algorithms
Parallel Computing
Parallel mining of closed sequential patterns
Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Privacy preserving sequential pattern mining in distributed databases
CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Association rules mining in vertically partitioned databases
Data & Knowledge Engineering - Special issue: WIDM 2004
Frequent Closed Sequence Mining without Candidate Maintenance
IEEE Transactions on Knowledge and Data Engineering
MapReduce: simplified data processing on large clusters
Communications of the ACM - 50th anniversary issue: 1958 - 2008
A new framework for detecting weighted sequential patterns in large sequence databases
Knowledge-Based Systems
Parallel Mining of Frequent Closed Patterns: Harnessing Modern Computer Architectures
ICDM '07 Proceedings of the 2007 Seventh IEEE International Conference on Data Mining
Efficient algorithms for incremental maintenance of closed sequential patterns in large databases
Data & Knowledge Engineering
Incremental updates of closed frequent itemsets over continuous data streams
Expert Systems with Applications: An International Journal
SeqStream: Mining Closed Sequential Patterns over Stream Sliding Windows
ICDM '08 Proceedings of the 2008 Eighth IEEE International Conference on Data Mining
Efficient Mining of Closed Repetitive Gapped Subsequences from a Sequence Database
ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
Mining closed patterns in multi-sequence time-series databases
Data & Knowledge Engineering
Efficient Episode Mining with Minimal and Non-overlapping Occurrences
ICDM '10 Proceedings of the 2010 IEEE International Conference on Data Mining
Mining closed episodes with simultaneous events
Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Parallel mining of maximal sequential patterns using multiple samples
The Journal of Supercomputing
Hi-index | 0.00 |
Parallel processing is essential to mining frequent closed sequences from massive volume of data in a timely manner. On the other hand, MapReduce is an ideal software framework to support distributed computing on large data sets on clusters of computers. In this paper, we develop a parallel implementation of BIDE algorithm on MapReduce, called BIDE-MR. It iteratively assigns the tasks of closure checking and pruning to different nodes in cluster. After one round of map-combine-partition-reduce, the closed frequent sequences with round-specific length and the candidates for the next round of computation are generated. Since the candidates and their pseudo project databases are independent with each other, BIDE-MR achieves high speed-ups. We implement BIDE-MR on an Apache Hadoop cluster and use BIDE-MR to mine the vehicles which frequently appear together from massive records collected at different monitoring sites. The results show that BIDE-MR attains good parallelization.