On implementing MPI-IO portably and with high performance
Proceedings of the sixth workshop on I/O in parallel and distributed systems
Parallel I/O for high performance computing
Parallel I/O for high performance computing
Parallelization of local BLAST service on workstation clusters
Future Generation Computer Systems
Database Allocation Strategies for Parallel BLAST Evaluation on Clusters
Distributed and Parallel Databases
Scalable Algorithms for Association Mining
IEEE Transactions on Knowledge and Data Engineering
TurboBLAST(r): A Parallel Implementation of BLAST Built on the TurboHub
IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
SPRINT: A Scalable Parallel Classifier for Data Mining
VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Three Improvements to the BLASTP Search of Genome Databases
SSDBM '00 Proceedings of the 12th International Conference on Scientific and Statistical Database Management
A Study of a Multi-Ring Buffer Management for BLAST
DEXA '03 Proceedings of the 14th International Workshop on Database and Expert Systems Applications
Scalability in the XFS file system
ATEC '96 Proceedings of the 1996 annual conference on USENIX Annual Technical Conference
PVFS: a parallel file system for linux clusters
ALS'00 Proceedings of the 4th annual Linux Showcase & Conference - Volume 4
Hyper-BLAST: a parallelized BLAST on cluster system
ICCS'03 Proceedings of the 2003 international conference on Computational science: PartIII
Polite parallel computing: student paper
Journal of Computing Sciences in Colleges
IEEE Transactions on Parallel and Distributed Systems
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
MPI framework for parallel searching in large biological databases
Journal of Parallel and Distributed Computing
IEEE Transactions on Parallel and Distributed Systems
Noncontiguous locking techniques for parallel file systems
Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Proceedings of the 5th conference on Computing frontiers
Mercury BLASTP: Accelerating Protein Sequence Alignment
ACM Transactions on Reconfigurable Technology and Systems (TRETS)
Massively parallel genomic sequence search on the Blue Gene/P architecture
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Adaptive Request Scheduling for Parallel Scientific Web Services
SSDBM '08 Proceedings of the 20th international conference on Scientific and Statistical Database Management
Software note: Construction and characterization of a rock-cluster-based EST analysis pipeline
Computational Biology and Chemistry
Bioportal: a portal for deployment of bioinformatics applications on cluster and grid environments
VECPAR'06 Proceedings of the 7th international conference on High performance computing for computational science
Parallel genome sequence searching on SupercomputerBlueGene/P
ECS'10/ECCTD'10/ECCOM'10/ECCS'10 Proceedings of the European conference of systems, and European conference of circuits technology and devices, and European conference of communications, and European conference on Computer science
Parallel performance evaluation of sequence nucleotide alignment on the supercomputer BlueGene/P
ECC'11 Proceedings of the 5th European conference on European computing conference
MP-PIPE: a massively parallel protein-protein interaction prediction engine
Proceedings of the international conference on Supercomputing
High performance computing workflow for protein functional annotation
Proceedings of the Conference on Extreme Science and Engineering Discovery Environment: Gateway to Discovery
Hi-index | 0.00 |
Searching biological sequence databases is one of the most routine tasks in computational biology. This task is significantly hampered by the exponential growth in sequence database sizes. Recent advances in parallelization of biological sequence search applications have enabled bioinformatics researchers to utilize high-performance computing platforms and, as a result, greatly reduce the execution time of their sequence database searches. However, existing parallel sequence search tools have been focusing mostly on parallelizing the sequence alignment engine. While the computation-intensive alignment tasks become cheaper with larger machines, data-intensive initial preparation and result merging tasks become more expensive. Inefficient handling of input and output data can easily create performance bottlenecks even on supercomputers. It also causes a considerable data management overhead. In this paper, we present a set of techniques for efficient and flexible data handling in parallel sequence search applications. We demonstrate our optimizations through improving mpiBLAST, an open-source parallel BLAST tool rapidly gaining popularity. These optimization techniques aim at enabling flexible database partitioning, reducing I/O by caching small auxiliary files and results, enabling parallel I/O on shared files, and performing scalable result processing protocols. As a result, we reduce mpiBLAST users' operational overhead by removing the requirement of prepartitioning databases. Meanwhile, our experiments show that these techniques can bring by an order of magnitude improvement to both the overall performance and scalability of mpiBLAST.