Accurate predictions of parallel program execution time
Journal of Parallel and Distributed Computing
Obstacle avoidance and navigation in the real world by a seeing robot rover
Obstacle avoidance and navigation in the real world by a seeing robot rover
MPI Microtask for programming the cell broadband engineTM processor
IBM Systems Journal
Sequoia: programming the memory hierarchy
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
CellSs: a programming model for the cell BE architecture
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Sequoia: programming the memory hierarchy
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
MEDEA '07 Proceedings of the 2007 workshop on MEmory performance: DEaling with Applications, systems and architecture
Scientific computing Kernels on the cell processor
International Journal of Parallel Programming
Experiences with parallelizing a bio-informatics program on the cell BE
HiPEAC'08 Proceedings of the 3rd international conference on High performance embedded architectures and compilers
Drug design issues on the cell BE
HiPEAC'08 Proceedings of the 3rd international conference on High performance embedded architectures and compilers
Boost.SIMD: generic programming for portable SIMDization
Proceedings of the 2014 Workshop on Programming models for SIMD/Vector processing
Hi-index | 0.00 |
The Cell processor is a typical example of a heterogeneous multiprocessor on-chip architecture that uses several levels of parallelism to deliver high performance. Reducing the gap between peak performance and effective performance is the challenge for software tool developers and the application developers. Image processing and media applications are typical "main stream" applications. We use the Harris algorithm for the detection of interest points in an image as a benchmark to compare the performance of several parallel schemes on a Cell processor. The impact of the DMA controlled data transfers and the synchronizations between SPEs explains the differences between the performance of the different parallelization schemes. The scalability of the architecture is modeled and evaluated.