A framework for readapting and running bioinformatics applications in the cloud
Proceedings of the 2012 ACM Research in Applied Computation Symposium
Hi-index | 14.98 |
The paper presents EvolvingSpace, a data centric distributed system, which is intended to address the data and application integration problem in bioinformatics data centers. The system employs commodity PCs for data storage and computation. EvolvingSpace manages data in a decentralized manner, which is convenient for storing data annotations and can eliminate potential data-access bottlenecks. It indexes distributed data in multilevels to facilitate the construction of complex workflows that consist of applications running on different types of data. In addition, the paper proposes a data locality and workflow aware scheduling algorithm (ES-Scheduling) to balance the data distribution and computing performance as well as throughput and workflow response time. We run extensive experiments using the system with real bioinformatics applications. Our results show that the system is efficient for running integrated bioinformatics applications and has good scalability.