Toward a methodology of optimizing programs for high-performance computers
ICS '93 Proceedings of the 7th international conference on Supercomputing
Using MPI: portable parallel programming with the message-passing interface
Using MPI: portable parallel programming with the message-passing interface
A linear space algorithm for computing maximal common subsequences
Communications of the ACM
Parallel Biological Sequence Comparison Using Prefix Computations
IPPS '99/SPDP '99 Proceedings of the 13th International Symposium on Parallel Processing and the 10th Symposium on Parallel and Distributed Processing
Introduction to the cell multiprocessor
IBM Journal of Research and Development - POWER5 and packaging
BioBench: A Benchmark Suite of Bioinformatics Applications
ISPASS '05 Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2005
Cell broadband engine architecture and its first implementation: a performance view
IBM Journal of Research and Development
Cell/B.E. blades: building blocks for scalable, real-time, interactive, and digital media servers
IBM Journal of Research and Development
A parallel strategy for biological sequence alignment in restricted memory space
Journal of Parallel and Distributed Computing
Recognition of circular patterns on GPUs: Performance analysis and contributions
Journal of Parallel and Distributed Computing
Programming the Intel 80-core network-on-a-chip terascale processor
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Intuitive Bioinformatics for Genomics Applications: Omega-Brigid Workflow Framework
IWANN '09 Proceedings of the 10th International Work-Conference on Artificial Neural Networks: Part II: Distributed Computing, Artificial Intelligence, Bioinformatics, Soft Computing, and Ambient Assisted Living
Optimizing data intensive GPGPU computations for DNA sequence alignment
Parallel Computing
A comparative study of Java and C performance in two large-scale parallel applications
Concurrency and Computation: Practice & Experience
Next-generation bioinformatics
Bioinformatics
Parallel linear space algorithm for large-scale sequence alignment
Euro-Par'05 Proceedings of the 11th international Euro-Par conference on Parallel Processing
Characterization of Smith-Waterman sequence database search in X10
Proceedings of the 2012 ACM SIGPLAN X10 Workshop
Direct approaches to exploit many-core architecture in bioinformatics
Future Generation Computer Systems
Hi-index | 0.02 |
Current computer engineering evolves at an accelerated pace, with hardware advancing towards new chip multiprocessors (CMP) architectures and with supporting software gearing towards new programming and abstraction paradigms, to obtain the maximum efficiency of the hardware at a low cost. In this context, Tilera Corporation has developed a brand new CMP architecture with 64 cores (tiles) called Tile64, and has launched several Peripheral Component Interconnect Express (PCIe) cards to be used and monitored from a host Personal Computer (PC). These cards may execute parallel applications built in C/C++ and compiled with the Tile-GCC compiler. We have previously demonstrated the usefulness of the Tile64 architecture for bioinformatics [S. Galvez, D. Diaz, P. Hernandez, F.J. Esteban, J.A. Caballero, G. Dorado, Next-generation bioinformatics: using many-core processor architecture to develop a web service for sequence alignment, Bioinformatics, 26 (2010) 683-686]. We have chosen a bioinformatics algorithm to test this many-core Tile64 architecture because of actual bioinformatics challenging needs: data-intensive workloads, space and time-consuming requirements and massive calculation. This algorithm, known as Needleman-Wunsch/Smith-Waterman (NW/SW), obtains an optimal sequence alignment in quadratic time and space cost, yet requires to be optimized to take full advantage of computing parallelization. In this paper we redesign, implement and fine-tune this algorithm, introducing key optimizations and changes that take advantage of specific Tile64 characteristics: RISC architecture, local tile's cache, length of memory word, shared memory usage, RAM file system, tile's intercommunication and job selection from a pool. The resulting algorithm - named MC64-NW/SW for Multicore64 Needleman-Wunsch/Smith-Waterman - achieves a gain of ~1000% when compared with the same algorithm on a x86 multi-core architecture. As far as we know, our NW/SW implementation is the fastest ever published for a standalone PC when aligning a pair of sequences larger than 20kb.