ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Stride directed prefetching in scalar processors
MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Evaluating stream buffers as a secondary cache replacement
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
MediaBench: a tool for evaluating and synthesizing multimedia and communicatons systems
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
The SimpleScalar tool set, version 2.0
ACM SIGARCH Computer Architecture News
Memory exploration for low power, embedded systems
Proceedings of the 36th annual ACM/IEEE Design Automation Conference
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Reducing Cache Pollution of Prefetching in a Small Data Cache
ICCD '01 Proceedings of the International Conference on Computer Design: VLSI in Computers & Processors
Adaptive Prefetching for Multimedia Applications in Embedded Systems
Proceedings of the conference on Design, automation and test in Europe - Volume 2
An MPEG-4 performance study for non-SIMD, general purpose architectures
ISPASS '03 Proceedings of the 2003 IEEE International Symposium on Performance Analysis of Systems and Software
Hardware and software cache prefetching techniques for MPEG benchmarks
IEEE Transactions on Circuits and Systems for Video Technology
Hi-index | 0.00 |
Multimedia applications in general and video processing, such as the MPEG4 Visual stream decoders, in particular are increasingly popular and important workloads for future embedded systems. Due to the high computational requirements, the need for low power, high performance embedded processors for multimedia applications is growing very fast. This paper proposes a new data prefetch mechanism called pattern-driven prefetching. PDP inspects the sequence of data cache misses and detects recurring patterns within that sequence. The patterns that are observed are based on the notions of the inter-miss stride (memory address stride between two misses) and the inter-miss interval (number of cycles between two misses). According to the patterns being detected, PDP initiates prefetch actions to anticipate future accesses and hide memory access latencies. PDP includes a simple yet effective stop criterion to avoid cache pollution and to reduce the number of additional memory accesses. The additional hardware needed for PDP is very limited making it an effective prefetch mechanism for embedded systems. In our experimental setup, we use cycle-level power/performance simulations of the MPEG4 Visual stream decoders from the MoMuSys reference software with various video streams. Our results show that PDP increases performance by as much as 45%, 24% and 10% for 2KB, 4KB and 8KB data caches, respectively, while the increase in external memory accesses remains under 0.6%. In conjunction with these performance increases, system-level (on-chip plus off-chip) energy reductions of 20%, 11.5% and 8% are obtained for 2KB, 4KB and 8KB data caches, respectively. In addition, we report significant speedups (up to 160%) for various other multimedia applications. Finally, we also show that PDP outperforms stream buffers.