Parallel Data Mining Experimentation Using Flexible Configurations

Authors:
José M. Peña Sánchez;F. Javier Crespo;Ernestina Menasalvas Ruiz;Victor Robles
Affiliations:
-;-;-;-
Venue:
TSCTC '02 Proceedings of the Third International Conference on Rough Sets and Current Trends in Computing
Year:
2002

Citing 3
Cited 1

Papyrus: a system for data mining over local and wide area clusters and super-clusters

SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
Parallel Generalized Association Rule Mining on Large Scale PC Cluster

Revised Papers from Large-Scale Parallel Data Mining, Workshop on Large-Scale Parallel KDD Systems, SIGKDD
PaDDMAS: Parallel and Distributed Data Mining Application Suite

IPDPS '00 Proceedings of the 14th International Symposium on Parallel and Distributed Processing

MOIRAE: an innovative component architecture with distributed control features

ICCS'03 Proceedings of the 2003 international conference on Computational science: PartII

Quantified Score

Hi-index	0.00

Visualization

Abstract

When data mining first appeared, several disciplines related to data analysis, like statistics or artificial intelligence were combined toward a new topic: extracting significant patterns from data. The original data sources were small datasets and, therefore, traditional machine learning techniques were the most common tools for this tasks. As the volume of data grows these traditional methods were reviewed and extended with the knowledge from experts working on the field of data management and databases. Today problems are even bigger than before and, once again, a new discipline allows the researchers to scale up to these data. This new discipline is distributed and parallel processing. In order to use parallel processing techniques, specific factors about the mining algorithms and the data should be considered. Nowadays, there are several new parallel algorithms, that in most of the cases are extensions of a traditional centralized algorithm. Many of these algorithms have common core parts and only differ on distribution schema, parallel coordination or load/task balancing methods. We call these groups algorithm families. On this paper we introduce a methodology to implement algorithm families. This methodology is founded on the MOIRAE distributed control architecture. In this work we will show how this architecture allows researchers to design parallel processing components that can change, dynamically, their behavior according to some control policies.