A sampling-based method for dynamic scheduling in distributed data mining environment

Authors:
Jifang Li
Affiliations:
Computer Science and Information Technology College, Zhejiang Wanli University, P.R. China
Venue:
WSEAS Transactions on Computers
Year:
2009

Citing 11
Cited 3

The network weather service: a distributed resource performance forecasting service for metacomputing

Future Generation Computer Systems - Special issue on metacomputing
Application-level scheduling on distributed heterogeneous networks

Supercomputing '96 Proceedings of the 1996 ACM/IEEE conference on Supercomputing
Condor-G: A Computation Management Agent for Multi-Institutional Grids

Cluster Computing
Dynamite - Blasting Obstacles to Parallel Cluster Computing

HPCN Europe '99 Proceedings of the 7th International Conference on High-Performance Computing and Networking
Mining of Association Rules in Very Large Databases: A Structured Parallel Approach

Euro-Par '99 Proceedings of the 5th International Euro-Par Conference on Parallel Processing
Nimrod: a tool for performing parametrised simulations using distributed workstations

HPDC '95 Proceedings of the 4th IEEE International Symposium on High Performance Distributed Computing
The core Legion object model

HPDC '96 Proceedings of the 5th IEEE International Symposium on High Performance Distributed Computing
Performance and Memory-Access Characterization of Data Mining Applications

WWC '98 Proceedings of the Workload Characterization: Methodology and Case Studies
High Performance Parametric Modeling with Nimrod/G: Killer Application for the Global Grid?

IPDPS '00 Proceedings of the 14th International Symposium on Parallel and Distributed Processing
Rough set based data mining tasks scheduling on knowledge grid

AWIC'05 Proceedings of the Third international conference on Advances in Web Intelligence
Rough set based computation times estimation on knowledge grid

EGC'05 Proceedings of the 2005 European conference on Advances in Grid Computing

Web service-driven framework for maintaining global version consistency in distributed enterprise portal

WSEAS Transactions on Computers
A grid data mining architecture for learning classifier systems

WSEAS Transactions on Computers
On the parallelism of I/O scheduling algorithms in MEMS-based large storage systems

WSEAS Transactions on Information Science and Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we propose a new solution for dynamic task scheduling in distributed environment. The key issue for scheduling tasks is that we can not obtain the execution time of irregular computations in advance. For this reason, we propose a method which is based on sampling to some typical data mining algorithm. We argue that a function is existed in the items: execution time, the size of data and the algorithm, therefore we can deduce the execution time of a data mining task from the corresponding the size of data and algorithm. The experimental results show that almost all the algorithms exhibits quasi linear scalability, but the slope of different algorithms is different. We adopt this sampling method for process the tasks scheduling in distributed data mining environment. The experimental results also show the sampling method is applicable to task scheduling in dynamic environment and can be adopted to obtain a higher result.