Petascale Computational Systems
Computer
A distributed architecture for data mining and integration
Proceedings of the second international workshop on Data-aware distributed computing
Exploring many task computing in scientific workflows
Proceedings of the 2nd Workshop on Many-Task Computing on Grids and Supercomputers
Optimizing resource allocation for scientific workflows using advance reservations
SSDBM'10 Proceedings of the 22nd international conference on Scientific and statistical database management
Scheduling of frequently communicating tasks
International Journal of Communication Systems
Hi-index | 0.00 |
The scientific computing community, especially academia is clearly in need of technology to handle and organize the 1-100+ Terabyte datasets coming from computer simulations and scientific instrumentation. In this paper we briefly describe GrayWulf, an exemplar cluster for data intensive applications using SQL Server and HPC Clusters. One of the key software components of GrayWulf is Trident, a scientific workflow workbench that performs automatic scheduling of workflows across the cluster. We examine the challenges of scheduling workflows on GrayWulf, algorithms to improve performance, and present early results from applying Trident to schedule data loading workflows on GrayWulf for an actual e-Science project