Compilation for explicitly managed memory hierarchies
Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming
Support for OpenMP tasks in Nanos v4
CASCON '07 Proceedings of the 2007 conference of the center for advanced studies on Collaborative research
MEDEA '07 Proceedings of the 2007 workshop on MEmory performance: DEaling with Applications, systems and architecture
CellSs: making it easier to program the cell broadband engine processor
IBM Journal of Research and Development
Larrabee: a many-core x86 architecture for visual computing
ACM SIGGRAPH 2008 papers
International Journal of Parallel Programming
Extending the OpenMP tasking model to allow dependent tasks
IWOMP'08 Proceedings of the 4th international conference on OpenMP in a new era of parallelism
Achieving high memory performance from heterogeneous architectures with the SARC programming model
Proceedings of the 10th workshop on MEmory performance: DEaling with Applications, systems and architecture
OpenMP extensions for FPGA accelerators
SAMOS'09 Proceedings of the 9th international conference on Systems, architectures, modeling and simulation
Scalability evaluation of a polymorphic register file: A CG case study
ARCS'11 Proceedings of the 24th international conference on Architecture of computing systems
Analysis of task offloading for accelerators
HiPEAC'10 Proceedings of the 5th international conference on High Performance Embedded Architectures and Compilers
Hi-index | 0.01 |
With the advent of multicore architectures, especially with the heterogeneous ones, both computational and memory top performance are difficult to obtain using traditional programming models. Usually, programmers have to fully reorganize the code and data of their applications in order to maximize resource usage, and work with the low-level interfaces offered by the vendor-provided SDKs, to obtain high computational and memory performances. In this paper, we present the evaluation of the SARC programming model on the Cell BE architecture, with respect to memory performance. We show how we have annotated the HPL STREAM and RandomAccess applications, and the memory bandwidth obtained. Results indicate that the programming model provides good productivity and competitive performance on this kind of architectures.