Optimization of frequent itemset mining on multiple-core processor

Authors:
Li Liu;Eric Li;Yimin Zhang;Zhizhong Tang
Affiliations:
Tsinghua University, Beijing, China and Intel China Research Center, Beijing, China;Intel China Research Center, Beijing, China;Intel China Research Center, Beijing, China;Tsinghua University, Beijing, China
Venue:
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Year:
2007

Citing 21
Cited 16

Software prefetching

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Stride directed prefetching in scalar processors

MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Mining association rules between sets of items in large databases

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
An effective hash-based algorithm for mining association rules

SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Beyond market baskets: generalizing association rules to correlations

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Efficient mining of emerging patterns: discovering trends and differences

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Mining frequent patterns without candidate generation

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Discovery of Frequent Episodes in Event Sequences

Data Mining and Knowledge Discovery
Mining Sequential Patterns

ICDE '95 Proceedings of the Eleventh International Conference on Data Engineering
MAFIA: A Maximal Frequent Itemset Algorithm for Transactional Databases

Proceedings of the 17th International Conference on Data Engineering
Fast Parallel Association Rule Mining without Candidacy Generation

ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Efficiently Mining Maximal Frequent Itemsets

ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Scalable Techniques for Mining Causal Structures

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
An Efficient Algorithm for Mining Association Rules in Large Databases

VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Efficient Mining of Partial Periodic Patterns in Time Series Database

ICDE '99 Proceedings of the 15th International Conference on Data Engineering
Shared Memory Parallelization of Data Mining Algorithms: Techniques, Programming Interface, and Performance

IEEE Transactions on Knowledge and Data Engineering
Improving database performance on simultaneous multithreading processors

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Cache-conscious frequent pattern mining on a modern processor

VLDB '05 Proceedings of the 31st international conference on Very large data bases
An efficient parallel and distributed algorithm for counting frequent sets

VECPAR'02 Proceedings of the 5th international conference on High performance computing for computational science
Tree partition based parallel frequent pattern mining on shared memory systems

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing

Pfp: parallel fp-growth for query recommendation

Proceedings of the 2008 ACM conference on Recommender systems
Frequent itemset mining on graphics processors

Proceedings of the Fifth International Workshop on Data Management on New Hardware
Cache-conscious buffering for database operators with state

Proceedings of the Fifth International Workshop on Data Management on New Hardware
Memory-efficient frequent-itemset mining

Proceedings of the 14th International Conference on Extending Database Technology
Parallel skyline computation on multicore architectures

Information Systems
Mapping data mining algorithms on a GPU architecture: a study

ISMIS'11 Proceedings of the 19th international conference on Foundations of intelligent systems
A parallel algorithm for computing borders

Proceedings of the 20th ACM international conference on Information and knowledge management
Optimization of query processing with cache conscious buffering operator

DNIS'10 Proceedings of the 6th international conference on Databases in Networked Information Systems
PARMA: a parallel randomized algorithm for approximate association rules mining in MapReduce

Proceedings of the 21st ACM international conference on Information and knowledge management
GPU acceleration of probabilistic frequent itemset mining from uncertain databases

Proceedings of the 21st ACM international conference on Information and knowledge management
A parallel association-rule mining algorithm

WISM'12 Proceedings of the 2012 international conference on Web Information Systems and Mining
Parallel approaches to machine learning-A comprehensive survey

Journal of Parallel and Distributed Computing
Scalable frequent itemset mining on many-core processors

Proceedings of the Ninth International Workshop on Data Management on New Hardware
Efficient mining of frequent itemsets in social network data based on MapReduce framework

Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining
Novel parallel method for mining frequent patterns on multi-core shared memory systems

DISCS-2013 Proceedings of the 2013 International Workshop on Data-Intensive Scalable Computing Systems
Accelerating frequent itemset mining on graphics processing units

The Journal of Supercomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Multi-core processors are proliferated across different domains in recent years. In this paper, we study the performance of frequent pattern mining on a modern multi-core machine. A detailed study shows that, even with the best implementation, current FP-tree based algorithms still under-utilize a multi-core system due to poor data locality and insufficient parallelism expression. We propose two techniques: a cache-conscious FP-array (frequent pattern array) and a lock-free dataset tiling parallelization mechanism to address this problem. The FP-array efficiently improves the data locality performance, and makes use of the benefits from hardware and software prefetching. The result yields an overall 4.0 speedup compared with the state-of-the-art implementation. Furthermore, to unlock the power of multi-core processor, a lock-free parallelization approach is proposed to restructure the FP-tree building algorithm. It not only eliminates the locks in building a single FP-tree with fine-grained threads, but also improves the temporal data locality performance. To summarize, with the proposed cache-conscious FP-array and lock-free parallelization enhancements, the overall FP-tree algorithm achieves a 24 fold speedup on an 8-core machine. Finally, we believe the presented techniques can be applied to other data mining tasks as well with the prevalence of multi-core processor.