Promoting performance and separation of concerns for data mining applications on the grid

  • Authors:
  • Vasco Furtado;Francisco Flávio de Souza;Walfredo Cirne

  • Affiliations:
  • University of Fortaleza - UNIFOR, Mestrado em Informática Aplicada - MIA, Washington Soares, Fortaleza - CE, Brazil;University of Fortaleza - UNIFOR, Mestrado em Informática Aplicada - MIA, Washington Soares, Fortaleza - CE, Brazil;Universidade Federal de Campina Grande - UFCG, Departamento de Sistemas e Computação, Campina Grande - PB, Brazil

  • Venue:
  • Future Generation Computer Systems - Special section: Data mining in grid computing environments
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Grid Computing brought the promise of making high-performance computing cheaper and more easily available than traditional supercomputing platforms. Such a promise was very well received by the data mining (DM) community, as DM applications typically process very large datasets and are thus very resource intensive. However, since the Grid is very dynamic and parallel data mining is prone to load unbalancing, obtaining good data mining performance on the Grid is hard. It typically requires the scheduler to understand the inner workings of the application, bringing two related problems. First, good Grid schedulers tend to be very specialized in the application they target. Second, changing the application may require changing the scheduler, which may be especially challenging when there is no clear separation between the application and the scheduler code. We here propose and evaluate a knowledge-based approach that provides abstractions to the DM developer and optimizes at runtime the DM application on the Grid.