Parallel and distributed methods for incremental frequent itemset mining

Authors:
M. E. Otey;S. Parthasarathy;Chao Wang;A. Veloso;W. Meira, Jr.
Affiliations:
Comput. & Inf. Sci. Dept., Ohio State Univ., Columbus, OH, USA;-;-;-;-
Venue:
IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
Year:
2004

Citing 0
Cited 6

Research issues in data stream association rule mining

ACM SIGMOD Record
Toward boosting distributed association rule mining by data de-clustering

Information Sciences: an International Journal
Distributed subgroup mining

PKDD'06 Proceedings of the 10th European conference on Principle and Practice of Knowledge Discovery in Databases
Agent enriched distributed association rules mining: a review

ADMI'11 Proceedings of the 7th international conference on Agents and Data Mining Interaction
Distributed frequent itemset mining framework for incremental data using MPI-style WSRF services

Proceedings of the International Conference on Advances in Computing, Communications and Informatics
Incremental Algorithm for Discovering Frequent Subsequences in Multiple Data Streams

International Journal of Data Warehousing and Mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

Traditional methods for data mining typically make the assumption that the data is centralized, memory-resident, and static. This assumption is no longer tenable. Such methods waste computational and input/output (I/O) resources when data is dynamic, and they impose excessive communication overhead when data is distributed. Efficient implementation of incremental data mining methods is, thus, becoming crucial for ensuring system scalability and facilitating knowledge discovery when data is dynamic and distributed. In this paper, we address this issue in the context of the important task of frequent itemset mining. We first present an efficient algorithm which dynamically maintains the required information even in the presence of data updates without examining the entire dataset. We then show how to parallelize this incremental algorithm. We also propose a distributed asynchronous algorithm, which imposes minimal communication overhead for mining distributed dynamic datasets. Our distributed approach is capable of generating local models (in which each site has a summary of its own database) as well as the global model of frequent itemsets (in which all sites have a summary of the entire database). This ability permits our approach not only to generate frequent itemsets, but also to generate high-contrast frequent itemsets, which allows one to examine how the data is skewed over different sites.