Parallel frequent itemset mining using systolic arrays

  • Authors:
  • Mohammad Karim Sohrabi;Ahmad Abdollahzadeh Barforoush

  • Affiliations:
  • Computer Engineering and Information Technology Department, Islamic Azad University of Semnan, Semnan, Iran;ISLAB, Computer Engineering and Information Technology, Amirkabir University of Technology, 424 Hafez Ave., Tehran 15914, Iran

  • Venue:
  • Knowledge-Based Systems
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Since extraction of frequent itemsets from a transaction database is crucial to several data mining tasks such as association rule generation, so frequent itemset mining is one of the most important concepts in data mining. One of the major problems in frequent itemset mining is the explosion of the number of results which is directly effecting on the execution time of itemset mining algorithms. To address this problem, closed itemsets have been proposed, which provides concise lossless representations of the original collection of frequent itemsets. Henceforth, the frequencies of all itemsets in the original collection can be reconstructed from the reduced collection. However, the reduction provided by this exact method is not sufficient to solve the pattern explosion problem, mainly because of high dimensional datasets which have large number of items in each transaction. Colossal itemset mining is another solution to reduce the output size which will not be useful if the set of all frequent itemsets have been required. Higher level of performance improvement can be obtained from efficient scalable parallel mining methods. In this paper we represent an efficient scalable parallel algorithm using systolic arrays to conduct mining of frequent itemsets in very large, such as high dimensional, datasets. In our algorithm, we use a bit matrix to compress the dataset and mapping the mining algorithm on the systolic arrays architecture. For this purpose, each transaction of dataset represents as a row in the bit matrix. We use this bit matrix structure to model the pattern mining as a systolic array problem. Our experimental results and performance study show that this algorithm outperforms substantially the best previously developed parallel algorithms.