The potential of the cell processor for scientific computing
Proceedings of the 3rd conference on Computing frontiers
Vectorized data processing on the cell broadband engine
DaMoN '07 Proceedings of the 3rd international workshop on Data management on new hardware
MT-clustalW: multithreading multiple sequence alignment
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Preliminary Analysis of the Cell BE Processor Limitations for Sequence Alignment Applications
SAMOS '08 Proceedings of the 8th international workshop on Embedded Computer Systems: Architectures, Modeling, and Simulation
Towards automatic program partitioning
Proceedings of the 6th ACM conference on Computing frontiers
Transactions on high-performance embedded architectures and compilers III
Hi-index | 0.00 |
The Cell Broadband Engine Architecture is a new heterogeneous multi-core architecture targeted at compute-intensive workloads. The architecture of the Cell BE has several features that are unique in high-performance general-purpose processors, such as static instruction scheduling, extensive support for vectorization, scratch pad memories, explicit programming of DMAs, mailbox communication, multiple processor cores, etc. It is necessary to make explicit use of these features to obtain high performance. Yet, little work reports on how to apply them and how much each of them contributes to performance. This paper presents our experiences with programming the Cell BE architecture. Our test application is Clustal W, a bio-informatics program for multiple sequence alignment. We report on how we apply the unique features of the Cell BE to Clustal Wand how important each is to obtain high performance. By making extensive use of vectorization and by parallelizing the application across all cores, we speedup the pairwise alignment phase of ClustalWwith a factor of 51.2 over PPU (superscalar) execution. The progressive alignment phase is sped up by a factor of 5.7 over PPU execution, resulting in an overall speedup by 9.1.