SDAFT: a novel scalable data access framework for parallel BLAST

  • Authors:
  • Jiangling Yin;Junyao Zhang;Jun Wang;Wu-chun Feng

  • Affiliations:
  • University of Central Florida, Orlando, Florida;University of Central Florida, Orlando, Florida;University of Central Florida, Orlando, Florida;Virginia Tech, Blacksburg, VA

  • Venue:
  • DISCS-2013 Proceedings of the 2013 International Workshop on Data-Intensive Scalable Computing Systems
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

To run search tasks in a parallel and load-balanced fashion, existing parallel BLAST schemes such as mpiBLAST introduce a data initialization preparation stage to move database fragments from the shared storage to local cluster nodes. Unfortunately, a quickly growing sequence database becomes too heavy to move in the network in today's big data era. In this paper, we develop a Scalable Data Access Framework (SDAFT) to solve the problem. It employs a distributed file system (DFS) to provide scalable data access for parallel sequence searches. SDAFT consists of two inter-locked components: 1) a data centric load-balanced scheduler (DC-scheduler) to enforce data-process locality and 2) a translation layer to translate conventional parallel I/O operations into HDFS I/O. By experimenting our SDAFT prototype system with real-world database and queries at a wide variety of computing platforms, we found that SDAFT can reduce I/O cost by a factor of 4 to 10 and double the overall execution performance as compared with existing schemes.