Scratchpad memory: design alternative for cache on-chip memory in embedded systems
Proceedings of the tenth international symposium on Hardware/software codesign
Introduction to the cell multiprocessor
IBM Journal of Research and Development - POWER5 and packaging
Prefetching irregular references for software cache on cell
Proceedings of the 6th annual IEEE/ACM international symposium on Code generation and optimization
A Novel Asynchronous Software Cache Implementation for the Cell-BE Processor
Languages and Compilers for Parallel Computing
Hybrid access-specific software cache techniques for the cell BE architecture
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
COMIC: a coherent shared memory interface for cell be
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Optimization of Content-Based Image Retrieval Functions
ISM '08 Proceedings of the 2008 Tenth IEEE International Symposium on Multimedia
HD-VideoBench. A Benchmark for Evaluating High Definition Digital Video Applications
IISWC '07 Proceedings of the 2007 IEEE 10th International Symposium on Workload Characterization
An efficient software cache for H.264 motion compensation
SOC'09 Proceedings of the 11th international conference on System-on-chip
A Multidimensional Software Cache for Scratchpad-Based Systems
International Journal of Embedded and Real-Time Communication Systems
Hi-index | 0.00 |
In this paper we propose an instruction to accelerate software caches. While DMAs are very efficient for predictable data sets that can be fetched before they are needed, they introduce a large latency overhead for computations with unpredictable access behavior. Software caches are advantageous when the data set is not predictable but exhibits locality. However, software caches also incur a large overhead. Because the main overhead is in the access function, we propose an instruction that replaces the look-up function of the software cache. This instruction is evaluated using the Multidimensional Software Cache and two multimedia kernels, GLCM and H.264 Motion Compensation. The results show that the proposed instruction accelerates the software cache access time by a factor of 2.6. This improvement translates to a 2.1 speedup for GLCM and 1.28 for MC, when compared with the IBM software cache.