Hitting the memory wall: implications of the obvious
ACM SIGARCH Computer Architecture News
The memory wall and the CMOS end-point
ACM SIGARCH Computer Architecture News
Proceedings of the 27th annual international symposium on Computer architecture
MPI-The Complete Reference, Volume 1: The MPI Core
MPI-The Complete Reference, Volume 1: The MPI Core
Dynamic multigrain parallelization on the cell broadband engine
Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming
On Characterizing Performance of the Cell Broadband Engine Element Interconnect Bus
NOCS '07 Proceedings of the First International Symposium on Networks-on-Chip
Effective Management of DRAM Bandwidth in Multicore Processors
PACT '07 Proceedings of the 16th International Conference on Parallel Architecture and Compilation Techniques
Characterizing the Cell EIB On-Chip Network
IEEE Micro
CellSs: making it easier to program the cell broadband engine processor
IBM Journal of Research and Development
Hierarchical Task-Based Programming With StarSs
International Journal of High Performance Computing Applications
Hi-index | 0.00 |
Cell Superscalar (CellSs) provides a simple, flexible and easy programming approach for the Cell Broadband Engine (Cell/B.E.) that automatically exploits the inherent concurrency of applications at a function or task level. The CellSs environment is based on a source-to-source compiler that translates annotated C or Fortran code and a runtime library tailored for the Cell/B.E. that orchestrates the concurrent execution of the application. We introduce a technique called bypassing that allows CellSs to perform core-to-core Direct Memory Access (DMA) transfers for generic applications. In this review we concisely summarize the bypassing practice and introduce two improvements: just-in-time renaming and lazy write-back. These extensions come at no additional cost and potentially increase performance by improving the perceived bandwidth of the Element Interconnect Bus (EIB). Experiments on five fundamental linear algebra kernels demonstrate the applicability of these techniques and quantify the benefit that can be reaped. We also present performance results for a first prototype of CellSs with bypassing.