An efficient parallel approach for identifying protein families in large-scale metagenomic data sets
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
ESCIENCE '08 Proceedings of the 2008 Fourth IEEE International Conference on eScience
AzureBlast: a case study of developing science applications on the cloud
Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
Coordinating Computation and I/O in Massively Parallel Sequence Search
IEEE Transactions on Parallel and Distributed Systems
IPDPS '11 Proceedings of the 2011 IEEE International Parallel & Distributed Processing Symposium
Combining in-situ and in-transit processing to enable extreme-scale scientific analysis
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Design and analysis of data management in scalable parallel scripting
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Managing data-movement for effective shared-memory parallelization of out-of-core sparse solvers
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Hi-index | 0.00 |
To run search tasks in a parallel and load-balanced fashion, existing parallel BLAST schemes such as mpiBLAST introduce a data initialization preparation stage to move database fragments from the shared storage to local cluster nodes. Unfortunately, a quickly growing sequence database becomes too heavy to move in the network in today's big data era. In this paper, we develop a Scalable Data Access Framework (SDAFT) to solve the problem. It employs a distributed file system (DFS) to provide scalable data access for parallel sequence searches. SDAFT consists of two inter-locked components: 1) a data centric load-balanced scheduler (DC-scheduler) to enforce data-process locality and 2) a translation layer to translate conventional parallel I/O operations into HDFS I/O. By experimenting our SDAFT prototype system with real-world database and queries at a wide variety of computing platforms, we found that SDAFT can reduce I/O cost by a factor of 4 to 10 and double the overall execution performance as compared with existing schemes.