Optimizing Compiler for the CELL Processor
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
The potential of the cell processor for scientific computing
Proceedings of the 3rd conference on Computing frontiers
Initial experiences porting a bioinformatics application to a graphics processor
PCI'05 Proceedings of the 10th Panhellenic conference on Advances in Informatics
RAxML-OMP: an efficient program for phylogenetic inference on SMPs
PaCT'05 Proceedings of the 8th international conference on Parallel Computing Technologies
Exploring New Search Algorithms and Hardware for Phylogenetics: RAxML Meets the IBM Cell
Journal of VLSI Signal Processing Systems
Large-scale maximum likelihood-based phylogenetic analysis on the IBM BlueGene/L
Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Orchestrating the execution of stream programs on multicore platforms
Proceedings of the 2008 ACM SIGPLAN conference on Programming language design and implementation
Optimizing large scale chemical transport models for multicore platforms
Proceedings of the 2008 Spring simulation multiconference
Large-scale phylogenetic analysis on current HPC architectures
Scientific Programming - Large-Scale Programming Tools and Environments
PRIB '08 Proceedings of the Third IAPR International Conference on Pattern Recognition in Bioinformatics
Vector stream processing for effective application of heterogeneous parallelism
Proceedings of the 2009 ACM symposium on Applied Computing
Scheduling dynamic parallelism on accelerators
Proceedings of the 6th ACM conference on Computing frontiers
Evaluating the cell broadband engine as a platform to run estimation of distribution algorithms
Proceedings of the 11th Annual Conference Companion on Genetic and Evolutionary Computation Conference: Late Breaking Papers
Exploiting Locality on the Cell/B.E. through Bypassing
SAMOS '09 Proceedings of the 9th International Workshop on Embedded Computer Systems: Architectures, Modeling, and Simulation
PDCS '07 Proceedings of the 19th IASTED International Conference on Parallel and Distributed Computing and Systems
CG-Cell: an NPB benchmark implementation on cell broadband engine
ICDCN'08 Proceedings of the 9th international conference on Distributed computing and networking
Modeling multigrain parallelism on heterogeneous multi-core processors: a case study of the cell BE
HiPEAC'08 Proceedings of the 3rd international conference on High performance embedded architectures and compilers
Drug design issues on the cell BE
HiPEAC'08 Proceedings of the 3rd international conference on High performance embedded architectures and compilers
Evaluation of streaming aggregation on parallel hardware architectures
Proceedings of the Fourth ACM International Conference on Distributed Event-Based Systems
High Resolution Program Flow Visualization of Hardware Accelerated Hybrid Multi-core Applications
CCGRID '10 Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing
MapReduce for the cell broadband engine architecture
IBM Journal of Research and Development
Compilation of stream programs for multicore processors that incorporate scratchpad memories
Proceedings of the Conference on Design, Automation and Test in Europe
Making the Best of Temporal Locality: Just-in-Time Renaming and Lazy Write-Back on the Cell/B.E
International Journal of High Performance Computing Applications
Scalable heterogeneous parallelism for atmospheric modeling and simulation
The Journal of Supercomputing
Fine-grain parallelism using multi-core, Cell/BE, and GPU Systems
Parallel Computing
High performance biological pairwise sequence alignment: FPGA versus GPU versus cell BE versus GPP
International Journal of Reconfigurable Computing - Special issue on High-Performance Reconfigurable Computing
Dynamic scheduling of stream programs on embedded multi-core processors
Proceedings of the eighth IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Exploiting multi-grain parallelism for efficient selective sweep detection
ICA3PP'12 Proceedings of the 12th international conference on Algorithms and Architectures for Parallel Processing - Volume Part I
Efficient programming paradigm for video streaming processing on TILE64 platform
The Journal of Supercomputing
Hi-index | 0.00 |
This paper addresses the problem of orchestrating and scheduling parallelism at multiple levels of granularity on heterogeneous multicore processors. We present mechanisms and policies for adaptive exploitation and scheduling of layered parallelism on the Cell Broadband Engine. Our policies combine event-driven task scheduling with malleable loop-level parallelism, which is exploited from the runtime system whenever task-level parallelism leaves idle cores. We present a scheduler for applications with layered parallelism on Cell and investigate its performance with RAxML, an application which infers large phylogenetic trees, using the Maximum Likelihood (ML) method. Our experiments show that the Cell benefits significantly from dynamic methods that selectively exploit the layers of parallelism in the system, in response to workload fluctuation. Our scheduler out performs the MPI version of RAxML, scheduled by the Linux kernel, by up to a factor of 2.6. We are able to execute RAxMLon one Cell four times faster than on a dual-processor system with Hyperthreaded Xeon processors, and 5--10% faster than on a single-processor system with a dual-core, quad-thread IBM Power5processor.