Scratchpad memory: design alternative for cache on-chip memory in embedded systems
Proceedings of the tenth international symposium on Hardware/software codesign
Memory Hierarchy Targeting Bi-Predictive Motion Compensation for H.264/AVC Decoder
ISVLSI '07 Proceedings of the IEEE Computer Society Annual Symposium on VLSI
Prefetching irregular references for software cache on cell
Proceedings of the 6th annual IEEE/ACM international symposium on Code generation and optimization
Hybrid access-specific software cache techniques for the cell BE architecture
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
COMIC: a coherent shared memory interface for cell be
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
HD-VideoBench. A Benchmark for Evaluating High Definition Digital Video Applications
IISWC '07 Proceedings of the 2007 IEEE 10th International Symposium on Workload Characterization
An elastic software cache with fast prefetching for motion compensation in video decoding
CODES/ISSS '10 Proceedings of the eighth IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Parallelizing the H.264 decoder on the cell BE architecture
EMSOFT '10 Proceedings of the tenth ACM international conference on Embedded software
An instruction to accelerate software caches
ARCS'11 Proceedings of the 24th international conference on Architecture of computing systems
Hi-index | 0.00 |
This paper presents an efficient software cache implementation for H.264 Motion Compensation on scratchpad memory based systems. For a wide range of applications - especially multimedia applications, the data set is predictable, making it possible to transfer the necessary data before the computation. Some kernels, however, depend on data that are known just before they are needed, such as the H.264 Motion Compensation (MC). MC has to stall while the data is transfered from the main memory. To overcome this problem and increase the performance, we analyze the data locality for the MC. Based on this analysis, we propose a 2D Software Cache (2DSC) implementation. The 2DSC exploits the application characteristics to reduce overheads, providing in average 65% improvement over the hand programmed DMAs.