Efficient algorithms for frequent pattern mining in many-task computing environments

Authors:
Kawuu W. Lin;Yu-Chin Lo
Affiliations:
-;-
Venue:
Knowledge-Based Systems
Year:
2013

Citing 18
Cited 0

A tree projection algorithm for generation of frequent item sets

Journal of Parallel and Distributed Computing - Special issue on high-performance data mining
Parallel Mining of Association Rules

IEEE Transactions on Knowledge and Data Engineering
Scalable Parallel Data Mining for Association Rules

IEEE Transactions on Knowledge and Data Engineering
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach

Data Mining and Knowledge Discovery
Frequent Pattern Mining on Message Passing Multiprocessor Systems

Distributed and Parallel Databases
Grid implementation of the Apriori algorithm

Advances in Engineering Software
Design and implementation of a data mining grid-aware architecture

Future Generation Computer Systems - Special section: Data mining in grid computing environments
Service-oriented middleware for distributed data mining on the grid

Journal of Parallel and Distributed Computing
Middleware for data mining applications on clusters and grids

Journal of Parallel and Distributed Computing
Balanced Tidset-based Parallel FP-tree Algorithm for the Frequent Pattern Mining on Grid System

SKG '08 Proceedings of the 2008 Fourth International Conference on Semantics, Knowledge and Grid
Parallel TID-based frequent pattern mining algorithm on a PC Cluster and grid computing system

Expert Systems with Applications: An International Journal
Cloud-Enabled Scalable Decision Tree Construction

SKG '09 Proceedings of the 2009 Fifth International Conference on Semantics, Knowledge and Grid
Tidset-based parallel FP-tree algorithm for the frequent pattern mining problem on PC clusters

GPC'08 Proceedings of the 3rd international conference on Advances in grid and pervasive computing
A novel parallel algorithm for frequent pattern mining with privacy preserved in cloud computing environments

International Journal of Ad Hoc and Ubiquitous Computing
An Efficient Data Mining Framework on Hadoop using Java Persistence API

CIT '10 Proceedings of the 2010 10th IEEE International Conference on Computer and Information Technology
The Strategy of Mining Association Rule Based on Cloud Computing

BCGIN '11 Proceedings of the 2011 International Conference on Business Computing and Global Informatization
An empirical study on mining sequential patterns in a grid computing environment

Expert Systems with Applications: An International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

The goal of data mining is to discover hidden useful information in large databases. Mining frequent patterns from transaction databases is an important problem in data mining. As the database size increases, the computation time and required memory also increase. Because the number of items increases, the user behaviours also become more complex. To solve the problem of increasing complexity, many researchers have applied parallel and distributed computing techniques to the discovery of frequent patterns from large amounts of data. However, most studies have focused on improving the performance for a single task and have neglected the many-task computing issue, which is important in the current cloud-computing environments. In these environments, an application is often provided as a service, e.g., the Google search engine, implying that many users can use it simultaneously. In this paper, we propose a set of algorithms, containing the Equal Working Set (EWS) algorithm, the Request On Demand (ROD) algorithm, the Small Size Working Set (SSWS) algorithm and the Progressive Size Working Set (PSWS) algorithm, for frequent pattern mining that provides a fast and scalable mining service in many-task computing environments. Through empirical evaluations in various simulation conditions, the proposed algorithms are shown to deliver excellent performance with respect to scalability and execution time.