Pattern-driven prefetching for multimedia applications on embedded processors

  • Authors:
  • Hassan Sbeyti;Smail Niar;Lieven Eeckhout

  • Affiliations:
  • LAMIH-ROI, University of Valenciennes, Le Mont Houy, Valenciennes Cedex, France;LAMIH-ROI, University of Valenciennes, Le Mont Houy, Valenciennes Cedex, France;ELIS, Ghent University, Sint-Pietersnieuwstraat, Ghent, Belgium

  • Venue:
  • Journal of Systems Architecture: the EUROMICRO Journal
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

Multimedia applications in general and video processing, such as the MPEG4 Visual stream decoders, in particular are increasingly popular and important workloads for future embedded systems. Due to the high computational requirements, the need for low power, high performance embedded processors for multimedia applications is growing very fast. This paper proposes a new data prefetch mechanism called pattern-driven prefetching. PDP inspects the sequence of data cache misses and detects recurring patterns within that sequence. The patterns that are observed are based on the notions of the inter-miss stride (memory address stride between two misses) and the inter-miss interval (number of cycles between two misses). According to the patterns being detected, PDP initiates prefetch actions to anticipate future accesses and hide memory access latencies. PDP includes a simple yet effective stop criterion to avoid cache pollution and to reduce the number of additional memory accesses. The additional hardware needed for PDP is very limited making it an effective prefetch mechanism for embedded systems. In our experimental setup, we use cycle-level power/performance simulations of the MPEG4 Visual stream decoders from the MoMuSys reference software with various video streams. Our results show that PDP increases performance by as much as 45%, 24% and 10% for 2KB, 4KB and 8KB data caches, respectively, while the increase in external memory accesses remains under 0.6%. In conjunction with these performance increases, system-level (on-chip plus off-chip) energy reductions of 20%, 11.5% and 8% are obtained for 2KB, 4KB and 8KB data caches, respectively. In addition, we report significant speedups (up to 160%) for various other multimedia applications. Finally, we also show that PDP outperforms stream buffers.