Improving cache locking performance of modern embedded systems via the addition of a miss table at the L2 cache level

  • Authors:
  • Abu Asaduzzaman;Fadi N. Sibai;Manira Rani

  • Affiliations:
  • Dept. of Computer Science and Engineering, Florida Atlantic University, Boca Raton, Florida, USA;UAE University, Computer Systems Design, P.O. Box 17551, CIT, Al Ain, United Arab Emirates;Dept. of Computer Science and Engineering, Florida Atlantic University, Boca Raton, Florida, USA

  • Venue:
  • Journal of Systems Architecture: the EUROMICRO Journal
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

To confer the robustness and high quality of service, modern computing architectures running real-time applications should provide high system performance and high timing predictability. Cache memory is used to improve performance by bridging the speed gap between the main memory and CPU. However, the cache introduces timing unpredictability creating serious challenges for real-time applications. Herein, we introduce a miss table (MT) based cache locking scheme at level-2 (L2) cache to further improve the timing predictability and system performance/power ratio. The MT holds information of block addresses related to the application being processed which cause most cache misses if not locked. Information in MT is used for efficient selection of the blocks to be locked and victim blocks to be replaced. This MT based approach improves timing predictability by locking important blocks with the highest number of misses inside the cache for the entire execution time. In addition, this technique decreases the average delay per task and total power consumption by reducing cache misses and avoiding unnecessary data transfers. This MT based solution is effective for both uniprocessors and multicores. We evaluate the proposed MT-based cache locking scheme by simulating an 8-core processor with 2 levels of caches using MPEG4 decoding, H.264/AVC decoding, FFT, and MI workloads. Experimental results show that in addition to improving the predictability, a reduction of 21% in mean delay per task and a reduction of 18% in total power consumption are achieved for MPEG4 (and H.264/AVC) by using MT and locking 25% of the L2. The MT results in about 5% delay and power reductions on these video applications, possibly more on applications with worse cache behavior. For the FFT and MI (and other) applications whose code fits inside the level-1 instruction (I1) cache, the mean delay per task increases only by 3% and total power consumption increases by 2% due to the addition of the MT.