Load balancing approach parallel algorithm for frequent pattern mining

Authors:
Kun-Ming Yu;Jiayi Zhou;Wei Chen Hsiao
Affiliations:
Department of Computer Science and Information Engineering, Chung Hua University;Institute of Engineering Science, Chung Hua University;Department of Information Management, Chung Hua University
Venue:
PaCT'07 Proceedings of the 9th international conference on Parallel Computing Technologies
Year:
2007

Citing 8
Cited 2

An effective hash-based algorithm for mining association rules

SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Distributed data mining in a chain store database of short transactions

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Multi-agent Technology for Distributed Data Mining and Classification

IAT '03 Proceedings of the IEEE/WIC International Conference on Intelligent Agent Technology
A new distributed data mining model based on similarity

Proceedings of the 2003 ACM symposium on Applied computing
Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach

Data Mining and Knowledge Discovery
Frequent Pattern Mining on Message Passing Multiprocessor Systems

Distributed and Parallel Databases
Data Structure for Association Rule Mining: T-Trees and P-Trees

IEEE Transactions on Knowledge and Data Engineering

A distributed recommender system architecture

International Journal of Web Engineering and Technology
Parallel frequent itemset mining using systolic arrays

Knowledge-Based Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Association rules mining from transaction-oriented databases is an important issue in data mining. Frequent pattern is crucial for association rules generation, time series analysis, classification, etc. There are two categories of algorithms that had been proposed, candidate set generate-and-test approach (Apriori-like) and Pattern growth approach. Many methods had been proposed to solve the association rules mining problem based on FP-tree instead of Apriori-like, since apriori-like algorithm scans the database many times. However, the computation time is costly when the database size is large with FP-tree data structure. Parallel and distributed computing is a good strategy to solve this circumstance. Some parallel algorithms had been proposed, however, most of them did not consider the load balancing issue. In this paper, we proposed a parallel and distributed mining algorithm based on FP-tree structure, Load Balancing FP-Tree (LFP-tree). The algorithm divides the item set for mining by evaluating the tree's width and depth. Moreover, a simple and trusty calculate formulation for loading degree is proposed. The experimental results show that LFP-tree can reduce the computation time and has less idle time compared with Parallel FP-Tree (PFP-tree). In addition, it has better speed-up ratio than PFP-tree when number of processors grow. The communication time can be reduced by preserving the heavy loading items in their local computing node.