Promoting performance and separation of concerns for data mining applications on the grid

Authors:
Vasco Furtado;Francisco Flávio de Souza;Walfredo Cirne
Affiliations:
University of Fortaleza - UNIFOR, Mestrado em Informática Aplicada - MIA, Washington Soares, Fortaleza - CE, Brazil;University of Fortaleza - UNIFOR, Mestrado em Informática Aplicada - MIA, Washington Soares, Fortaleza - CE, Brazil;Universidade Federal de Campina Grande - UFCG, Departamento de Sistemas e Computação, Campina Grande - PB, Brazil
Venue:
Future Generation Computer Systems - Special section: Data mining in grid computing environments
Year:
2007

Citing 10
Cited 0

C4.5: programs for machine learning

C4.5: programs for machine learning
Knowledge engineering and management: the CommonKADS methodology

Knowledge engineering and management: the CommonKADS methodology
Scheduling High Performance Data Mining Tasks on a Data Grid Environment

Euro-Par '02 Proceedings of the 8th International Euro-Par Conference on Parallel Processing
Distributed data mining on the grid

Future Generation Computer Systems - Grid computing: Towards a new computing infrastructure
Implementing data cube construction using a cluster middleware: algorithms, implementation experience, and performance evaluation

Future Generation Computer Systems - Selected papers from CCGRID 2002
Combining Workstations and Supercomputers to Support Grid Applications: The Parallel Tomography Experience

HCW '00 Proceedings of the 9th Heterogeneous Computing Workshop
Data Mining on NASA's Information Power Grid

HPDC '00 Proceedings of the 9th IEEE International Symposium on High Performance Distributed Computing
Resource Allocation in the Grid Using Reinforcement Learning

AAMAS '04 Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems - Volume 3
Scheduling in Bag-of-Task Grids: The PAUÁ Case

SBAC-PAD '04 Proceedings of the 16th Symposium on Computer Architecture and High Performance Computing
Artificial Intelligence and Grids: Workflow Planning and Beyond

IEEE Intelligent Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Grid Computing brought the promise of making high-performance computing cheaper and more easily available than traditional supercomputing platforms. Such a promise was very well received by the data mining (DM) community, as DM applications typically process very large datasets and are thus very resource intensive. However, since the Grid is very dynamic and parallel data mining is prone to load unbalancing, obtaining good data mining performance on the Grid is hard. It typically requires the scheduler to understand the inner workings of the application, bringing two related problems. First, good Grid schedulers tend to be very specialized in the application they target. Second, changing the application may require changing the scheduler, which may be especially challenging when there is no clear separation between the application and the scheduler code. We here propose and evaluate a knowledge-based approach that provides abstractions to the DM developer and optimizes at runtime the DM application on the Grid.