Efficient and precise array access analysis
ACM Transactions on Programming Languages and Systems (TOPLAS)
Pointer analysis for structured parallel programs
ACM Transactions on Programming Languages and Systems (TOPLAS)
An Empirical Study of Fortran Programs for Parallelizing Compilers
IEEE Transactions on Parallel and Distributed Systems
Prefetching irregular references for software cache on cell
Proceedings of the 6th annual IEEE/ACM international symposium on Code generation and optimization
Orchestrating data transfer for the cell/B.E. processor
Proceedings of the 22nd annual international conference on Supercomputing
Hybrid access-specific software cache techniques for the cell BE architecture
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Automatic Pre-Fetch and Modulo Scheduling Transformations for the Cell BE Architecture
Languages and Compilers for Parallel Computing
Optimizing the use of static buffers for DMA on a CELL chip
LCPC'06 Proceedings of the 19th international conference on Languages and compilers for parallel computing
IWOMP'05/IWOMP'06 Proceedings of the 2005 and 2006 international conference on OpenMP shared memory parallel programming
Hi-index | 0.00 |
Software cache has been showed as a robust approach in multi-core systems with no hardware support for transparent data transfers between local and global memories. Software cache provides the user with a transparent view of the memory architecture and considerably improves the programmability of such systems. But this software approach can suffer from poor performance due to considerable overheads related to software mechanisms to maintain the memory consistency. This paper presents a set of alternatives to smooth their impact. A specific write-back mechanism is introduced based on some degree of speculation regarding the number of threads actually modifying the same cache lines. A case study based on the Cell BE processor is described. Performance evaluation indicates that improvements due to the optimized software-cache structures combined with the proposed code-optimizations translate into 20% up to 40% speedup factors, compared to a traditional software cache approach.