Data cache locking for higher program predictability

Authors:
Xavier Vera;Björn Lisper;Jingling Xue
Affiliations:
Institutionen för Datateknik, Mälardalens Högskola, Västerå s, Sweden;Institutionen för Datateknik, Mälardalens Högskola, Västerå s, Sweden;University of New South Wales, Sydney, Australia
Venue:
SIGMETRICS '03 Proceedings of the 2003 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Year:
2003

Citing 25
Cited 32

Strategies for cache and local memory management by global program transformation

Journal of Parallel and Distributed Computing - Special Issue on Languages, Compilers and environments for Parallel Programming
The cache performance and optimizations of blocked algorithms

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
A data locality optimizing algorithm

PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Design and evaluation of a compiler algorithm for prefetching

ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Compiler blockability of numerical algorithms

Proceedings of the 1992 ACM/IEEE conference on Supercomputing
To copy or not to copy: a compile-time technique for assessing when data copying should be used to eliminate cache conflicts

Proceedings of the 1993 ACM/IEEE conference on Supercomputing
Tile size selection using cache organization and data layout

PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
Data transformations for eliminating conflict misses

PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
A Linear Algebra Framework for Automatic Determination of Optimal Data Layouts

IEEE Transactions on Parallel and Distributed Systems
Nonlinear array layouts for hierarchical memory systems

ICS '99 Proceedings of the 13th international conference on Supercomputing
Cache miss equations: a compiler framework for analyzing and tuning memory behavior

ACM Transactions on Programming Languages and Systems (TOPLAS)
Efficient and Precise Cache Behavior Prediction for Real-TimeSystems

Real-Time Systems
Exact analysis of the cache behavior of nested loops

Proceedings of the ACM SIGPLAN 2001 conference on Programming language design and implementation
Cache Behavior Prediction by Abstract Interpretation

SAS '96 Proceedings of the Third International Symposium on Static Analysis
Static Locality Analysis for Cache Management

PACT '97 Proceedings of the 1997 International Conference on Parallel Architectures and Compilation Techniques
Automatic Analytical Modeling for the Estimation of Cache Misses

PACT '99 Proceedings of the 1999 International Conference on Parallel Architectures and Compilation Techniques
Efficient worst case timing analysis of data caching

RTAS '96 Proceedings of the 2nd IEEE Real-Time Technology and Applications Symposium (RTAS '96)
Timing Analysis for Data Caches and Set-Associative Caches

RTAS '97 Proceedings of the 3rd IEEE Real-Time Technology and Applications Symposium (RTAS '97)
A Method to Improve the Estimated Worst-Case Performance of Data Caching

RTCSA '99 Proceedings of the Sixth International Conference on Real-Time Computing Systems and Applications
Efficient microarchitecture modeling and path analysis for real-time software

RTSS '95 Proceedings of the 16th IEEE Real-Time Systems Symposium
Integrating the timing analysis of pipelining and instruction caching

RTSS '95 Proceedings of the 16th IEEE Real-Time Systems Symposium
Timing Anomalies in Dynamically Scheduled Microprocessors

RTSS '99 Proceedings of the 20th IEEE Real-Time Systems Symposium
Let's Study Whole-Program Cache Behaviour Analytically

HPCA '02 Proceedings of the 8th International Symposium on High-Performance Computer Architecture
Efficient and Accurate Analytical Modeling of Whole-Program Data Cache Behavior

IEEE Transactions on Computers
Modeling complex flows for worst-case execution time analysis

RTSS'10 Proceedings of the 21st IEEE conference on Real-time systems symposium

Data Caches in Multitasking Hard Real-Time Systems

RTSS '03 Proceedings of the 24th IEEE International Real-Time Systems Symposium
Measuring the cache interference cost in preemptive real-time systems

Proceedings of the 2004 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
Adaptive code unloading for resource-constrained JVMs

Proceedings of the 2004 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
WCRT analysis for a uniprocessor with a unified prioritized cache

LCTES '05 Proceedings of the 2005 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
Improving power efficiency with compiler-assisted cache replacement

Journal of Embedded Computing - Cache exploitation in embedded systems
WCET analysis of instruction caches with prefetching

Proceedings of the 2007 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
Compile-time decided instruction cache locking using worst-case execution paths

CODES+ISSS '07 Proceedings of the 5th IEEE/ACM international conference on Hardware/software codesign and system synthesis
Data cache locking for tight timing calculations

ACM Transactions on Embedded Computing Systems (TECS)
Exploring locking & partitioning for predictable shared caches on multi-cores

Proceedings of the 45th annual Design Automation Conference
A data centered approach for cache partitioning in embedded real-time database system

WSEAS Transactions on Computers
Accelerating WCET-driven optimizations by the invariant path paradigm: a case study of loop unswitching

Proceedings of th 12th International Workshop on Software and Compilers for Embedded Systems
Impact of level-2 cache sharing on the performance and power requirements of homogeneous multicore embedded systems

Microprocessors & Microsystems
Instruction cache locking inside a binary rewriter

CASES '09 Proceedings of the 2009 international conference on Compilers, architecture, and synthesis for embedded systems
Memory hierarchies, pipelines, and buses for future architectures in time-critical embedded systems

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Improving cache locking performance of modern embedded systems via the addition of a miss table at the L2 cache level

Journal of Systems Architecture: the EUROMICRO Journal
Instruction cache locking using temporal reuse profile

Proceedings of the 47th Design Automation Conference
Using NAND flash memory for executing large volume real-time programs in automotive embedded systems

EMSOFT '10 Proceedings of the tenth ACM international conference on Embedded software
Tightening the bounds on feasible preemptions

ACM Transactions on Embedded Computing Systems (TECS)
An algorithm for deciding minimal cache sizes in real-time systems

Proceedings of the 13th annual conference on Genetic and evolutionary computation
Joint task assignment and cache partitioning with cache locking for WCET minimization on MPSoC

Journal of Parallel and Distributed Computing
WCET-driven cache-aware code positioning

CASES '11 Proceedings of the 14th international conference on Compilers, architectures and synthesis for embedded systems
Approximating Pareto optimal compiler optimization sequences—a trade-off between WCET, ACET and code size

Software—Practice & Experience
Tuning genetic algorithms for real time systems using a grid

PDCAT'04 Proceedings of the 5th international conference on Parallel and Distributed Computing: applications and Technologies
WCET-centric partial instruction cache locking

Proceedings of the 49th Annual Design Automation Conference
WCET-aware data selection and allocation for scratchpad memory

Proceedings of the 13th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, Tools and Theory for Embedded Systems
Functional-Level Energy Characterization of µC/OS-II and Cache Locking for Energy Saving

Bell Labs Technical Journal
Data cache organization for accurate timing analysis

Real-Time Systems
Sensitivity of cache replacement policies

ACM Transactions on Embedded Computing Systems (TECS) - Special section on ESTIMedia'12, LCTES'11, rigorous embedded systems design, and multiprocessor system-on-chip for cyber-physical systems
LP-NUCA: networks-in-cache for high-performance low-power embedded processors

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Location-aware cache management for many-core processors with deep cache hierarchy

SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Implementation and evaluation of global and partitioned scheduling in a real-time OS

Real-Time Systems
Epipe: A low-cost fault-tolerance technique considering WCET constraints

Journal of Systems Architecture: the EUROMICRO Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

Caches have become increasingly important with the widening gap between main memory and processor speeds. However, they are a source of unpredictability due to their characteristics, resulting in programs behaving in a different way than expected.Cache locking mechanisms adapt caches to the needs of real-time systems. Locking the cache is a solution that trades performance for predictability: at a cost of generally lower performance, the time of accessing the memory becomes predictable.This paper combines compile-time cache analysis with data cache locking to estimate the worst-case memory performance (WCMP) in a safe, tight and fast way. In order to get predictable cache behavior, we first lock the cache for those parts of the code where the static analysis fails. To minimize the performance degradation, our method loads the cache, if necessary, with data likely to be accessed.Experimental results show that this scheme is fully predictable, without compromising the performance of the transformed program. When compared to an algorithm that assumes compulsory misses when the state of the cache is unknown, our approach eliminates all overestimation for the set of benchmarks, giving an exact WCMP of the transformed program without any significant decrease in performance.