Hiding cache miss penalty using priority-based execution for embedded processors

Authors:
Sanghyun Park;Aviral Shrivastava;Yunheung Paek
Affiliations:
Seoul National University, Korea;Arizona State University;Seoul National University, Korea
Venue:
Proceedings of the conference on Design, automation and test in Europe
Year:
2008

Citing 22
Cited 2

Software prefetching

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Effective compiler support for predicated execution using the hyperblock

MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Improving data cache performance by pre-executing instructions under a cache miss

ICS '97 Proceedings of the 11th international conference on Supercomputing
Prefetching using Markov predictors

Proceedings of the 24th annual international symposium on Computer architecture
MediaBench: a tool for evaluating and synthesizing multimedia and communicatons systems

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Retrospective: simultaneous multithreading: maximizing on-chip parallelism

25 years of the international symposia on Computer architecture (selected papers)
Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Evaluating the impact of memory system performance on software prefetching and locality optimizations

ICS '01 Proceedings of the 15th international conference on Supercomputing
Data prefetching by dependence graph precomputation

ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Increasing processor performance by implementing deeper pipelines

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
A large, fast instruction window for tolerating cache misses

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Dynamic speculative precomputation

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
SPEC CPU2000: Measuring CPU Performance in the New Millennium

Computer
SimpleScalar: An Infrastructure for Computer System Modeling

Computer
Cherry: checkpointed early resource recycling in out-of-order microprocessors

Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Out-of-Order Execution may not be Cost-Effective on Processors Featuring Simultaneous Multithreading

HPCA '99 Proceedings of the 5th International Symposium on High Performance Computer Architecture
Runahead Execution: An Alternative to Very Large Instruction Windows for Out-of-Order Processors

HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Cheap Out-of-Order Execution Using Delayed Issue

ICCD '00 Proceedings of the 2000 IEEE International Conference on Computer Design: VLSI in Computers & Processors
Operation tables for scheduling in the presence of incomplete bypassing

Proceedings of the 2nd IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Kilo-Instruction Processors: Overcoming the Memory Wall

IEEE Micro
Speculative execution for hiding memory latency

MEDEA '04 Proceedings of the 2004 workshop on MEmory performance: DEaling with Applications , systems and architecture
MiBench: A free, commercially representative embedded benchmark suite

WWC '01 Proceedings of the Workload Characterization, 2001. WWC-4. 2001 IEEE International Workshop

Reducing impact of cache miss stalls in embedded systems by extracting guaranteed independent instructions

CASES '09 Proceedings of the 2009 international conference on Compilers, architecture, and synthesis for embedded systems
Minimizing accumulative memory load cost on multi-core DSPs with multi-level memory

Journal of Systems Architecture: the EUROMICRO Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

The contribution of memory latency to execution time continues to increase, and latency hiding mechanisms become ever more important for efficient processor design. While high-end processors can use elaborate techniques like multiple issue, out-of-order execution, speculative execution, value prediction etc. to tolerate high memory latencies, they are often not viable solutions for embedded processors, due to significant area, power and chip complexity overheads. This paper proposes a hardware-software cooperative approach, called priority-based execution to hide cache miss penalty for embedded processors. The compiler classifies the instructions into low-priority and high-priority instructions. The processor executes the high-priority instructions, but delays the execution of low priority instructions. They are executed on a cache miss to hide the cache miss penalty. We empirically evaluate our proposal on the Intel XScale compiler and microarchitecture. Experimental results on bench-marks from Multimedia, Media Bench, MiBench, and SPEC2000 demonstrate an average 17% performance improvements, hiding 75% cache miss penalty.