Parallel Data Mining Experimentation Using Flexible Configurations

  • Authors:
  • José M. Peña Sánchez;F. Javier Crespo;Ernestina Menasalvas Ruiz;Victor Robles

  • Affiliations:
  • -;-;-;-

  • Venue:
  • TSCTC '02 Proceedings of the Third International Conference on Rough Sets and Current Trends in Computing
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

When data mining first appeared, several disciplines related to data analysis, like statistics or artificial intelligence were combined toward a new topic: extracting significant patterns from data. The original data sources were small datasets and, therefore, traditional machine learning techniques were the most common tools for this tasks. As the volume of data grows these traditional methods were reviewed and extended with the knowledge from experts working on the field of data management and databases. Today problems are even bigger than before and, once again, a new discipline allows the researchers to scale up to these data. This new discipline is distributed and parallel processing. In order to use parallel processing techniques, specific factors about the mining algorithms and the data should be considered. Nowadays, there are several new parallel algorithms, that in most of the cases are extensions of a traditional centralized algorithm. Many of these algorithms have common core parts and only differ on distribution schema, parallel coordination or load/task balancing methods. We call these groups algorithm families. On this paper we introduce a methodology to implement algorithm families. This methodology is founded on the MOIRAE distributed control architecture. In this work we will show how this architecture allows researchers to design parallel processing components that can change, dynamically, their behavior according to some control policies.