More convenient more overhead: the performance evaluation of Hadoop streaming
Proceedings of the 2011 ACM Symposium on Research in Applied Computation
Cloud-based image processing system with priority-based data distribution mechanism
Computer Communications
Scalable parallel computing on clouds using Twister4Azure iterative MapReduce
Future Generation Computer Systems
A Multiclass Classification Tool Using Cloud Computing Architecture
ASONAM '12 Proceedings of the 2012 International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2012)
Don't match twice: redundancy-free similarity computation with MapReduce
Proceedings of the Second Workshop on Data Analytics in the Cloud
Development of a virtualized supercomputing environment for genomic analysis
The Journal of Supercomputing
A Study on Linear Elastic FEM by Cloud Computing
Proceedings of the Second International Conference on Innovative Computing and Cloud Computing
Hi-index | 0.00 |
Executing large number of independent jobs or jobs comprising of large number of tasks that perform minimal intertask communication is a common requirement in many domains. Various technologies ranging from classic job schedulers to the latest cloud technologies such as MapReduce can be used to execute these "many-tasks” in parallel. In this paper, we present our experience in applying two cloud technologies Apache Hadoop and Microsoft DryadLINQ to two bioinformatics applications with the above characteristics. The applications are a pairwise Alu sequence alignment application and an Expressed Sequence Tag (EST) sequence assembly program. First, we compare the performance of these cloud technologies using the above applications and also compare them with traditional MPI implementation in one application. Next, we analyze the effect of inhomogeneous data on the scheduling mechanisms of the cloud technologies. Finally, we present a comparison of performance of the cloud technologies under virtual and nonvirtual hardware platforms.