C4.5: programs for machine learning
C4.5: programs for machine learning
Knowledge engineering and management: the CommonKADS methodology
Knowledge engineering and management: the CommonKADS methodology
Scheduling High Performance Data Mining Tasks on a Data Grid Environment
Euro-Par '02 Proceedings of the 8th International Euro-Par Conference on Parallel Processing
Distributed data mining on the grid
Future Generation Computer Systems - Grid computing: Towards a new computing infrastructure
Future Generation Computer Systems - Selected papers from CCGRID 2002
HCW '00 Proceedings of the 9th Heterogeneous Computing Workshop
Data Mining on NASA's Information Power Grid
HPDC '00 Proceedings of the 9th IEEE International Symposium on High Performance Distributed Computing
Resource Allocation in the Grid Using Reinforcement Learning
AAMAS '04 Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems - Volume 3
Scheduling in Bag-of-Task Grids: The PAUÁ Case
SBAC-PAD '04 Proceedings of the 16th Symposium on Computer Architecture and High Performance Computing
Artificial Intelligence and Grids: Workflow Planning and Beyond
IEEE Intelligent Systems
Hi-index | 0.00 |
Grid Computing brought the promise of making high-performance computing cheaper and more easily available than traditional supercomputing platforms. Such a promise was very well received by the data mining (DM) community, as DM applications typically process very large datasets and are thus very resource intensive. However, since the Grid is very dynamic and parallel data mining is prone to load unbalancing, obtaining good data mining performance on the Grid is hard. It typically requires the scheduler to understand the inner workings of the application, bringing two related problems. First, good Grid schedulers tend to be very specialized in the application they target. Second, changing the application may require changing the scheduler, which may be especially challenging when there is no clear separation between the application and the scheduler code. We here propose and evaluate a knowledge-based approach that provides abstractions to the DM developer and optimizes at runtime the DM application on the Grid.