GTM: A Principled Alternative to the Self-Organizing Map
ICANN 96 Proceedings of the 1996 International Conference on Artificial Neural Networks
Dryad: distributed data-parallel programs from sequential building blocks
Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
MapReduce: simplified data processing on large clusters
Communications of the ACM - 50th anniversary issue: 1958 - 2008
Bioinformatics
High Performance Dimension Reduction and Visualization for Large High-Dimensional Data Analysis
CCGRID '10 Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing
OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation
An approach for processing large and non-uniform media objects on mapreduce-based clusters
ICADL'11 Proceedings of the 13th international conference on Asia-pacific digital libraries: for cultural heritage, knowledge dissemination, and future creation
Middleware alternatives for storm surge predictions in Windows Azure
Proceedings of the 3rd workshop on Scientific Cloud Computing Date
Efficient provisioning of bursty scientific workloads on the cloud using adaptive elasticity control
Proceedings of the 3rd workshop on Scientific Cloud Computing Date
Cloud-based image processing system with priority-based data distribution mechanism
Computer Communications
A survey of migration mechanisms of virtual machines
ACM Computing Surveys (CSUR)
Storm surge simulation and load balancing in Azure cloud
Proceedings of the High Performance Computing Symposium
Hi-index | 0.00 |
Cloud computing offers exciting new approaches for scientific computing that leverages the hardware and software investments on large scale data centers by major commercial players. Loosely coupled problems are very important in many scientific fields and are on the rise with the ongoing move towards data intensive computing. There exist several approaches to leverage clouds & cloud oriented data processing frameworks to perform pleasingly parallel computations. In this paper we present two pleasingly parallel biomedical applications, 1) assembly of genome fragments 2) dimension reduction in the analysis of chemical structures, implemented utilizing cloud infrastructure service based utility computing models of Amazon AWS and Microsoft Windows Azure as well as utilizing MapReduce based data processing frameworks, Apache Hadoop and Microsoft DryadLINQ. We review and compare each of the frameworks and perform a comparative study among them based on performance, efficiency, cost and the usability. Cloud service based utility computing model and the managed parallelism (MapReduce) exhibited comparable performance and efficiencies for the applications we considered. We analyze the variations in cost between the different platform choices (eg: EC2 instance types), highlighting the need to select the appropriate platform based on the nature of the computation.