Improving cache locking performance of modern embedded systems via the addition of a miss table at the L2 cache level

Authors:
Abu Asaduzzaman;Fadi N. Sibai;Manira Rani
Affiliations:
Dept. of Computer Science and Engineering, Florida Atlantic University, Boca Raton, Florida, USA;UAE University, Computer Systems Design, P.O. Box 17551, CIT, Al Ain, United Arab Emirates;Dept. of Computer Science and Engineering, Florida Atlantic University, Boca Raton, Florida, USA
Venue:
Journal of Systems Architecture: the EUROMICRO Journal
Year:
2010

Citing 6
Cited 0

Data cache locking for higher program predictability

SIGMETRICS '03 Proceedings of the 2003 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Cache modeling and optimization for portable devices running MPEG-4 video decoder

Multimedia Tools and Applications
Exploring locking & partitioning for predictable shared caches on multi-cores

Proceedings of the 45th annual Design Automation Conference
On the performance benefits of sharing and privatizing second and third-level cache memories in homogeneous multi-core architectures

Microprocessors & Microsystems
Memory Systems: Cache, DRAM, Disk

Memory Systems: Cache, DRAM, Disk
On using locking caches in embedded real-time systems

ICESS'05 Proceedings of the Second international conference on Embedded Software and Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

To confer the robustness and high quality of service, modern computing architectures running real-time applications should provide high system performance and high timing predictability. Cache memory is used to improve performance by bridging the speed gap between the main memory and CPU. However, the cache introduces timing unpredictability creating serious challenges for real-time applications. Herein, we introduce a miss table (MT) based cache locking scheme at level-2 (L2) cache to further improve the timing predictability and system performance/power ratio. The MT holds information of block addresses related to the application being processed which cause most cache misses if not locked. Information in MT is used for efficient selection of the blocks to be locked and victim blocks to be replaced. This MT based approach improves timing predictability by locking important blocks with the highest number of misses inside the cache for the entire execution time. In addition, this technique decreases the average delay per task and total power consumption by reducing cache misses and avoiding unnecessary data transfers. This MT based solution is effective for both uniprocessors and multicores. We evaluate the proposed MT-based cache locking scheme by simulating an 8-core processor with 2 levels of caches using MPEG4 decoding, H.264/AVC decoding, FFT, and MI workloads. Experimental results show that in addition to improving the predictability, a reduction of 21% in mean delay per task and a reduction of 18% in total power consumption are achieved for MPEG4 (and H.264/AVC) by using MT and locking 25% of the L2. The MT results in about 5% delay and power reductions on these video applications, possibly more on applications with worse cache behavior. For the FFT and MI (and other) applications whose code fits inside the level-1 instruction (I1) cache, the mean delay per task increases only by 3% and total power consumption increases by 2% due to the addition of the MT.