Factoring: a method for scheduling parallel loops
Communications of the ACM
Scheduling Divisible Loads in Parallel and Distributed Systems
Scheduling Divisible Loads in Parallel and Distributed Systems
Parallel data intensive computing in scientific and commercial applications
Parallel Computing - Parallel data-intensive algorithms and applications
Load Balancing Highly Irregular Computations with the Adaptive Factoring
IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Divisible Load Scheduling in Systems with Limited Memory
Cluster Computing
IEEE Transactions on Parallel and Distributed Systems
Modeling master/worker applications for automatic performance tuning
Parallel Computing - Algorithmic skeletons
Real-Time Divisible Load Scheduling for Cluster Computing
RTAS '07 Proceedings of the 13th IEEE Real Time and Embedded Technology and Applications Symposium
Adaptive Divisible Load Model for Scheduling Data-Intensive Grid Applications
ICCS '07 Proceedings of the 7th international conference on Computational Science, Part I: ICCS 2007
Scheduling Divisible Real-Time Loads on Clusters with Varying Processor Start Times
RTCSA '08 Proceedings of the 2008 14th IEEE International Conference on Embedded and Real-Time Computing Systems and Applications
ESCIENCE '08 Proceedings of the 2008 Fourth IEEE International Conference on eScience
Bioinformatics
Real-time scheduling of divisible loads in cluster computing environments
Journal of Parallel and Distributed Computing
Dynamic Load-Balanced Multicast for Data-Intensive Applications on Clouds
CCGRID '10 Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing
AzureBlast: a case study of developing science applications on the cloud
Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
Coordinating Computation and I/O in Massively Parallel Sequence Search
IEEE Transactions on Parallel and Distributed Systems
An adaptive scheduling method for grid computing
Euro-Par'06 Proceedings of the 12th international conference on Parallel Processing
Hi-index | 0.00 |
Data-intensive applications are those that explore, query, analyze, and, in general, process very large data sets. Generally, these applications can be naturally implemented in parallel but, in many cases, these implementations show severe performance problems mainly due to load imbalances, inefficient use of available resources, and improper data partition policies. It is worth noticing that the problem becomes more complex when the conditions causing these problems change at run time. This paper proposes a methodology for dynamically improving the performance of certain data-intensive applications based on: adapting the size and number of data partitions, and the number of processing nodes, to the current application conditions in homogeneous clusters. To this end, the processing of each exploration is monitored and gathered data is used to dynamically tune the performance of the application. The tuning parameters included in the methodology are: (i) the partition factor of the data set, (ii) the distribution of the data chunks, and (iii) the number of processing nodes to be used. The methodology assumes that a single execution includes multiple related explorations on the same partitioned data set, and that data chunks are ordered according to their processing times during the application execution to assign first the most time consuming partitions. The methodology has been validated using the well-known bioinformatics tool--BLAST--and through extensive experimentation using simulation. Reported results are encouraging in terms of reducing total execution time of the application (up to a 40 % in some cases).