Distributed data mining patterns and services: an architecture and experiments

Authors:
Eugenio Cesario;Domenico Talia
Affiliations:
ICAR-CNR, Via P. Bucci 41C, 87036 Rende (CS), Italy;ICAR-CNR, Via P. Bucci 41C, 87036 Rende (CS), Italy and DEIS-University of Calabria, Via P.Bucci 41C, 87036 Rende (CS), Italy
Venue:
Concurrency and Computation: Practice & Experience
Year:
2012

Citing 30
Cited 0

Analyzing scalability of parallel algorithms and architectures

Journal of Parallel and Distributed Computing - Special issue on scalability of parallel algorithms and architectures
Data mining: practical machine learning tools and techniques with Java implementations

Data mining: practical machine learning tools and techniques with Java implementations
An empirical comparison of four initialization methods for the K-Means algorithm

Pattern Recognition Letters
RACHET: An Efficient Cover-Based Merging of Clustering Hierarchies from Distributed Datasets

Distributed and Parallel Databases - Special issue: Parallel and distributed data mining
The knowledge grid

Communications of the ACM
Isoefficiency: Measuring the Scalability of Parallel Algorithms and Architectures

IEEE Parallel & Distributed Technology: Systems & Technology
Parallel and Distributed Association Mining: A Survey

IEEE Concurrency
Parallel Mining of Association Rules

IEEE Transactions on Knowledge and Data Engineering
Refining Initial Points for K-Means Clustering

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Fast Parallel Association Rule Mining without Candidacy Generation

ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
A Divise Initialisation Method for Clustering Algorithms

PKDD '99 Proceedings of the Third European Conference on Principles of Data Mining and Knowledge Discovery
Accurate Recasting of Parameter Estimation Algorithms Using Sufficient Statistics for Efficient Parallel Speed-Up: Demonstrated for Center-Based Data Clustering Algorithms

PKDD '00 Proceedings of the 4th European Conference on Principles of Data Mining and Knowledge Discovery
Collective, Hierarchical Clustering from Distributed, Heterogeneous Data

Revised Papers from Large-Scale Parallel Data Mining, Workshop on Large-Scale Parallel KDD Systems, SIGKDD
Building predictors from vertically distributed data

CASCON '04 Proceedings of the 2004 conference of the Centre for Advanced Studies on Collaborative research
Scalable density-based distributed clustering

PKDD '04 Proceedings of the 8th European Conference on Principles and Practice of Knowledge Discovery in Databases
Web Services Composition for Distributed Data Mining

ICPPW '05 Proceedings of the 2005 International Conference on Parallel Processing Workshops
The Design of Discovery Net: Towards Open Grid Services for Knowledge Discovery

International Journal of High Performance Computing Applications
Introduction to Data Mining, (First Edition)

Introduction to Data Mining, (First Edition)
Globus® Toolkit 4, First Edition: Programming Java Services (The Morgan Kaufmann Series in Networking)

Globus® Toolkit 4, First Edition: Programming Java Services (The Morgan Kaufmann Series in Networking)
Distributed data mining services leveraging WSRF

Future Generation Computer Systems - Special section: Data mining in grid computing environments
Grid-enabling data mining applications with DataMiningGrid: An architectural perspective

Future Generation Computer Systems
ODAM: An Optimized Distributed Association Rule Mining Algorithm

IEEE Distributed Systems Online
The Weka4WS framework for distributed data mining in service-oriented Grids

Concurrency and Computation: Practice & Experience
Distributed Data Mining Models as Services on the Grid

ICDMW '08 Proceedings of the 2008 IEEE International Conference on Data Mining Workshops
Service Oriented KDD: A Framework for Grid Data Mining Workflows

ICDMW '08 Proceedings of the 2008 IEEE International Conference on Data Mining Workshops
A distributed architecture for data mining and integration

Proceedings of the second international workshop on Data-aware distributed computing
How distributed data mining tasks can thrive as knowledge services

Communications of the ACM
Developing distributed data mining applications in the knowledge grid framework

VECPAR'04 Proceedings of the 6th international conference on High Performance Computing for Computational Science
Distributed data mining on grids: services, tools, and applications

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
Distributed EM Algorithm for Gaussian Mixtures in Sensor Networks

IEEE Transactions on Neural Networks

Quantified Score

Hi-index	0.00

Visualization

Abstract

Distributed data mining implements techniques for analyzing data on distributed computing systems by exploiting data distribution and parallel algorithms. The grid is a computing infrastructure for implementing distributed high-performance applications and solving complex problems, offering effective support to the implementation and use of data mining and knowledge discovery systems. The Web Services Resource Framework has become the standard for the implementation of grid services and applications, and it can be exploited for developing high-level services for distributed data mining applications. This paper describes how distributed data mining patterns, such as collective learning, ensemble learning, and meta-learning models, can be implemented as Web Services Resource Framework mining services by exploiting the grid infrastructure. The goal of this work was to design a distributed architectural model that can be exploited for different distributed mining patterns deployed as grid services for the analysis of dispersed data sources. In order to validate such an approach, we presented also the implementation of two clustering algorithms on the developed architecture. In particular, the distributed k-means and distributed expectation maximization were exploited as pilot examples to show the suitability of the implemented service-oriented framework. An extensive evaluation of its performance was provided. Copyright © 2011 John Wiley & Sons, Ltd.