Strategies for cache and local memory management by global program transformation
Journal of Parallel and Distributed Computing - Special Issue on Languages, Compilers and environments for Parallel Programming
The cache performance and optimizations of blocked algorithms
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
A data locality optimizing algorithm
PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
An extendible approach for analyzing fixed priority hard real-time tasks
Real-Time Systems
Compiler support for software-based cache partitioning
LCTES '95 Proceedings of the ACM SIGPLAN 1995 workshop on Languages, compilers, & tools for real-time systems
Data transformations for eliminating conflict misses
PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
Analysis of Cache-Related Preemption Delay in Fixed-Priority Preemptive Scheduling
IEEE Transactions on Computers
Computer architecture (2nd ed.): a quantitative approach
Computer architecture (2nd ed.): a quantitative approach
Reuse-driven tiling for improving data locality
International Journal of Parallel Programming
Cache miss equations: a compiler framework for analyzing and tuning memory behavior
ACM Transactions on Programming Languages and Systems (TOPLAS)
Efficient and Precise Cache Behavior Prediction for Real-TimeSystems
Real-Time Systems
Loop tiling for parallelism
Engineering and Analysis of Fixed Priority Schedulers
IEEE Transactions on Software Engineering
Effective Analysis for Engineering Real-Time Fixed Priority Schedulers
IEEE Transactions on Software Engineering
The Impact of an Ada Run-Time System's Performance Characteristics on Scheduling Models
Ada-Europe '93 Proceedings of the 12th Ada-Europe International Conference
Deriving Annotations for Tight Calculation of Execution Time
Euro-Par '97 Proceedings of the Third International Euro-Par Conference on Parallel Processing
Integrating Path and Timing Analysis Using Instruction-Level Simulation Techniques
LCTES '98 Proceedings of the ACM SIGPLAN Workshop on Languages, Compilers, and Tools for Embedded Systems
Automatic Parallelization in the Polytope Model
The Data Parallel Programming Model: Foundations, HPF Realization, and Scientific Applications
Cache Behavior Prediction by Abstract Interpretation
SAS '96 Proceedings of the Third International Symposium on Static Analysis
Data cache locking for higher program predictability
SIGMETRICS '03 Proceedings of the 2003 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Static Locality Analysis for Cache Management
PACT '97 Proceedings of the 1997 International Conference on Parallel Architectures and Compilation Techniques
Efficient worst case timing analysis of data caching
RTAS '96 Proceedings of the 2nd IEEE Real-Time Technology and Applications Symposium (RTAS '96)
Adding instruction cache effect to schedulability analysis of preemptive real-time systems
RTAS '96 Proceedings of the 2nd IEEE Real-Time Technology and Applications Symposium (RTAS '96)
Timing Analysis for Data Caches and Set-Associative Caches
RTAS '97 Proceedings of the 3rd IEEE Real-Time Technology and Applications Symposium (RTAS '97)
OS-Controlled Cache Predictability for Real-Time Systems
RTAS '97 Proceedings of the 3rd IEEE Real-Time Technology and Applications Symposium (RTAS '97)
Pipeline Timing Analysis Using a Trace-Driven Simulator
RTCSA '99 Proceedings of the Sixth International Conference on Real-Time Computing Systems and Applications
A Method to Improve the Estimated Worst-Case Performance of Data Caching
RTCSA '99 Proceedings of the Sixth International Conference on Real-Time Computing Systems and Applications
Efficient microarchitecture modeling and path analysis for real-time software
RTSS '95 Proceedings of the 16th IEEE Real-Time Systems Symposium
Integrating the timing analysis of pipelining and instruction caching
RTSS '95 Proceedings of the 16th IEEE Real-Time Systems Symposium
Timing Anomalies in Dynamically Scheduled Microprocessors
RTSS '99 Proceedings of the 20th IEEE Real-Time Systems Symposium
Low-Complexity Algorithms for Static Cache Locking in Multitasking Hard Real-Time Systems
RTSS '02 Proceedings of the 23rd IEEE Real-Time Systems Symposium
Let's Study Whole-Program Cache Behaviour Analytically
HPCA '02 Proceedings of the 8th International Symposium on High-Performance Computer Architecture
Efficient, context-sensitive pointer analysis for c programs
Efficient, context-sensitive pointer analysis for c programs
Optimizing Program Locality Through CMEs and GAs
Proceedings of the 12th International Conference on Parallel Architectures and Compilation Techniques
Data Caches in Multitasking Hard Real-Time Systems
RTSS '03 Proceedings of the 24th IEEE International Real-Time Systems Symposium
A fast and accurate framework to analyze and optimize cache memory behavior
ACM Transactions on Programming Languages and Systems (TOPLAS)
Modeling complex flows for worst-case execution time analysis
RTSS'10 Proceedings of the 21st IEEE conference on Real-time systems symposium
Implementing time-predictable load and store operations
EMSOFT '09 Proceedings of the seventh ACM international conference on Embedded software
Joint task assignment and cache partitioning with cache locking for WCET minimization on MPSoC
Journal of Parallel and Distributed Computing
WCET-aware data selection and allocation for scratchpad memory
Proceedings of the 13th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, Tools and Theory for Embedded Systems
Static analysis of the worst-case memory performance for irregular codes with indirections
ACM Transactions on Architecture and Code Optimization (TACO)
Data cache organization for accurate timing analysis
Real-Time Systems
Compiler directed write-mode selection for high performance low power volatile PCM
Proceedings of the 14th ACM SIGPLAN/SIGBED conference on Languages, compilers and tools for embedded systems
Integrated instruction cache analysis and locking in multitasking real-time systems
Proceedings of the 50th Annual Design Automation Conference
Explicit reservation of cache memory in a predictable, preemptive multitasking real-time system
ACM Transactions on Embedded Computing Systems (TECS)
An empirical model for predicting cross-core performance interference on multicore processors
PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
Optimizing a combined WCET-WCEC problem in instruction fetching for real-time systems
Journal of Systems Architecture: the EUROMICRO Journal
Epipe: A low-cost fault-tolerance technique considering WCET constraints
Journal of Systems Architecture: the EUROMICRO Journal
Address independent estimation of the boundaries of cache performance
Microprocessors & Microsystems
Hi-index | 0.00 |
Caches have become increasingly important with the widening gap between main memory and processor speeds. Small and fast cache memories are designed to bridge this discrepancy. However, they are only effective when programs exhibit sufficient data locality. In addition, caches are a source of unpredictability, resulting in programs sometimes behaving in a different way than expected. Detailed information about the number of cache misses and their causes allows us to predict cache behavior and to detect bottlenecks. Small modifications in the source code may change memory patterns, thereby altering the cache behavior. Code transformations, which take the cache behavior into account, might result in a high cache performance improvement. However, cache memory behavior is very hard to predict, thus making the task of optimizing and timing cache behavior very difficult. This article proposes and evaluates a new compiler framework that times cache behavior for multitasking systems. Our method explores the use of cache partitioning and dynamic cache locking to provide worst-case performance estimates in a safe and tight way for multitasking systems. We use cache partitioning, which divides the cache among tasks to eliminate intertask cache interferences. We combine static cache analysis and cache-locking mechanisms to ensure that all intratask conflicts, and consequently, memory access times, are exactly predictable. The results of our experiments demonstrate the capability of our framework to describe cache behavior at compile time. We compare our timing approach with a system equipped with a nonpartitioned, but statically, locked data cache. Our method outperforms static cache locking for all analyzed task sets under various cache architectures, demonstrating that our fully predictable scheme does not compromise the performance of the transformed programs.