Architectural exploration and optimization of local memory in embedded systems
ISSS '97 Proceedings of the 10th international symposium on System synthesis
Optimizing the data cache performance of a software MPEG-2 video decoder
MULTIMEDIA '97 Proceedings of the fifth ACM international conference on Multimedia
MediaBench: a tool for evaluating and synthesizing multimedia and communicatons systems
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
CPU Cache Prefetching: Timing Evaluation of Hardware Implementations
IEEE Transactions on Computers
Proceedings of the 27th annual international symposium on Computer architecture
A new cache architecture based on temporal and spatial locality
Journal of Systems Architecture: the EUROMICRO Journal
Cache performance for multimedia applications
ICS '01 Proceedings of the 15th international conference on Supercomputing
Data memory design and exploration for low-power embedded systems
ACM Transactions on Design Automation of Electronic Systems (TODAES)
A highly configurable cache architecture for embedded systems
Proceedings of the 30th annual international symposium on Computer architecture
An Analysis of Cache Performance of Multimedia Applications
IEEE Transactions on Computers
MiBench: A free, commercially representative embedded benchmark suite
WWC '01 Proceedings of the Workload Characterization, 2001. WWC-4. 2001 IEEE International Workshop
Neighbor cache prefetching for multimedia image and video processing
IEEE Transactions on Multimedia
Hardware and software cache prefetching techniques for MPEG benchmarks
IEEE Transactions on Circuits and Systems for Video Technology
Hi-index | 0.00 |
This paper proposes a data cache with small space for low power, but high performance on multimedia applications. The basic architecture is a split-cache consisting of a direct-mapped cache with small block size (DMC) and a fully-associative buffer with large block size (FAB). To overcome the disadvantage caused by small cache areas, two hardware mechanisms are enhanced considering the operational behaviors of multimedia applications: an adaptive multi-block prefetching to initiate various fetch sizes for FAB and an efficient block filtering to remove the data likely to be rarely reused for DMC. The simulations on MediaBench show that the proposed 5kB cache can achieve up to 57% and 50% of power saving while providing almost equal and better performance compared with the 16kB 4-way set associative cache and 17kB stream caches, respectively.