A heterogeneous computing system for data mining workflows

Authors:
Ping Luo;Kevin Lü;Qing He;Zhongzhi Shi
Affiliations:
Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China;Brunel University, Uxbridge, U.K.;Graduate School of the Chinese Academy of Sciences, Beijing, China;Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
Venue:
BNCOD'06 Proceedings of the 23rd British National Conference on Databases, conference on Flexible and Efficient Information Handling
Year:
2006

Citing 7
Cited 0

Allocating Modules to Processors in a Distributed System

IEEE Transactions on Software Engineering
A comparison of eleven static heuristics for mapping a class of independent tasks onto heterogeneous distributed computing systems

Journal of Parallel and Distributed Computing
The knowledge grid

Communications of the ACM
MAGE: An Agent-Oriented Programming Environment

ICCI '04 Proceedings of the Third IEEE International Conference on Cognitive Informatics
Web Services Composition for Distributed Data Mining

ICPPW '05 Proceedings of the 2005 International Conference on Parallel Processing Workshops
Weka4WS: a WSRF-enabled weka toolkit for distributed data mining on grids

PKDD'05 Proceedings of the 9th European conference on Principles and Practice of Knowledge Discovery in Databases
Distributed data mining on grids: services, tools, and applications

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics

Quantified Score

Hi-index	0.00

Visualization

Abstract

The computing-intensive Data Mining (DM) process calls for the support of a Heterogeneous Computing (HC) system, which consists of multiple computers with different configurations, connected by a high-speed LAN, for increased computational power and resources. DM process can be described as a multi-phase pipeline process, and in each phase there could be many optional methods. This makes the workflow of DM very complex and can be modelled only by a Directed Acyclic Graph (DAG). An HC system needs an effective and efficient scheduling framework, which orchestrates all the computing hardware to perform multiple competitive DM workflows. Motivated by the need of a practical solution of the scheduling problem for the DM workflow, this paper proposes a dynamic DAG scheduling algorithm according to the characteristics of execution time estimation model for DM jobs. Based on an approximate estimation of job execution time, this algorithm first maps DM jobs to machines in a decentralized and diligent (defined in this paper) manner. Then the performance of this initial mapping can be improved through job migrations when necessary. The scheduling heuristic used in it considers the factors of both the minimal completion time criterion and the critical path in a DAG. We implement this system in an established Multi-Agent System (MAS) environment, in which the reuse of existing DM algorithms is achieved by encapsulating them into agents. Practical classification problems are used to test and measure the system performance. The detailed experiment procedure and result analysis are also discussed in this paper.