Predictability of load/store instruction latencies

Authors:
Santosh G. Abraham;Rabin A. Sugumar;Daniel Windheiser;B. R. Rau;Rajiv Gupta
Affiliations:
EECS Department, The University of Michigan, Ann Arbor, MI;EECS Department, The University of Michigan, Ann Arbor, MI and Cray Research, Eau Claire, WI;Cray Research, Eau Claire, WI, EECS Department, The University of Michigan, Ann Arbor, MI;Hewlett Packard Laboratories, 1501 Page Mill Road, Bldg. 3U-7, Palo Alto, CA;Hewlett Packard Laboratories, 1501 Page Mill Road, Bldg. 3U-7, Palo Alto, CA
Venue:
MICRO 26 Proceedings of the 26th annual international symposium on Microarchitecture
Year:
1993

Citing 16
Cited 38

A VLIW architecture for a trace Scheduling Compiler

IEEE Transactions on Computers - Special issue on architectural support for programming languages and operating systems
The Cydra 5 Departmental Supercomputer: Design Philosophies, Decisions, and Trade-Offs

Computer
Program optimization for instruction caches

ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
Achieving high instruction cache performance with an optimizing compiler

ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Software prefetching

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Data cache performance of supercomputer applications

Proceedings of the 1990 ACM/IEEE conference on Supercomputing
An architecture for software-controlled data prefetching

ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
MemSpy: analyzing memory system bottlenecks in programs

SIGMETRICS '92/PERFORMANCE '92 Proceedings of the 1992 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
Profile-guided automatic inline expansion for C programs

Software—Practice & Experience
Reducing memory latency via non-blocking and prefetching caches

ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Design and evaluation of a compiler algorithm for prefetching

ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Balanced scheduling: instruction scheduling when memory latency is uncertain

PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
The Cydra 5 minisupercomputer: architecture and implementation

The Journal of Supercomputing - Special issue on instruction-level parallelism
The superblock: an effective technique for VLIW and superscalar compilation

The Journal of Supercomputing - Special issue on instruction-level parallelism
Register allocation by priority-based coloring

SIGPLAN '84 Proceedings of the 1984 SIGPLAN symposium on Compiler construction
Reduction in main memory traffic through the efficient use of local memory

Reduction in main memory traffic through the efficient use of local memory

A data cache with multiple caching strategies tuned to different types of locality

ICS '95 Proceedings of the 9th international conference on Supercomputing
A modified approach to data cache management

Proceedings of the 28th annual international symposium on Microarchitecture
SPAID: software prefetching in pointer- and call-intensive environments

Proceedings of the 28th annual international symposium on Microarchitecture
Cache miss heuristics and preloading techniques for general-purpose programs

Proceedings of the 28th annual international symposium on Microarchitecture
Memory bandwidth limitations of future microprocessors

ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
A quantitative analysis of loop nest locality

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Value locality and load value prediction

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Speculative execution via address prediction and data prefetching

ICS '97 Proceedings of the 11th international conference on Supercomputing
Predicting data cache misses in non-numeric applications through correlation profiling

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Cache sensitive modulo scheduling

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Memory data organization for improved cache performance in embedded processor applications

ACM Transactions on Design Automation of Electronic Systems (TODAES)
Exploiting spatial locality in data caches using spatial footprints

Proceedings of the 25th annual international symposium on Computer architecture
Load latency tolerance in dynamically scheduled processors

MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Functional Implementation Techniques for CPU Cache Memories

IEEE Transactions on Computers - Special issue on cache memory and related problems
Quantifying loop nest locality using SPEC'95 and the perfect benchmarks

ACM Transactions on Computer Systems (TOCS)
Understanding Why Correlation Profiling Improves the Predictability of Data Cache Misses in Nonnumeric Applications

IEEE Transactions on Computers
Understanding the backward slices of performance degrading instructions

Proceedings of the 27th annual international symposium on Computer architecture
OS and compiler considerations in the design of the IA-64 architecture

ACM SIGPLAN Notices
OS and compiler considerations in the design of the IA-64 architecture

ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Execution-based prediction using speculative slices

ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Profile-guided post-link stride prefetching

ICS '02 Proceedings of the 16th international conference on Supercomputing
Reducing set-associative cache energy via way-prediction and selective direct-mapping

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Increasing power efficiency of multi-core network processors through data filtering

CASES '02 Proceedings of the 2002 international conference on Compilers, architecture, and synthesis for embedded systems
Exploiting Value Locality to Exceed the Dataflow Limit

International Journal of Parallel Programming
Transparent Threads: Resource Sharing in SMT Processors for High Single-Thread Performance

Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
Annotated Memory References: A Mechanism for Informed Cache Management

Euro-Par '99 Proceedings of the 5th International Euro-Par Conference on Parallel Processing
Enhancing memory level parallelism via recovery-free value prediction

ICS '03 Proceedings of the 17th annual international conference on Supercomputing
Software assistance for data caches

HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
A study of source-level compiler algorithms for automatic construction of pre-execution code

ACM Transactions on Computer Systems (TOCS)
Compiler orchestrated prefetching via speculation and predication

ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
Performance of Runtime Optimization on BLAST

Proceedings of the international symposium on Code generation and optimization
Enhancing Memory-Level Parallelism via Recovery-Free Value Prediction

IEEE Transactions on Computers
Simple penalty-sensitive replacement policies for caches

Proceedings of the 3rd conference on Computing frontiers
Decomposing memory performance: data structures and phases

Proceedings of the 5th international symposium on Memory management
Hybrid multi-core architecture for boosting single-threaded performance

ACM SIGARCH Computer Architecture News
Latency-tolerant software pipelining in a production compiler

Proceedings of the 6th annual IEEE/ACM international symposium on Code generation and optimization
Enhancing last-level cache performance by block bypassing and early miss determination

ACSAC'06 Proceedings of the 11th Asia-Pacific conference on Advances in Computer Systems Architecture
Reducing off-chip memory traffic by selective cache management scheme in GPGPUs

Proceedings of the 5th Annual Workshop on General Purpose Processing with Graphics Processing Units

Quantified Score

Hi-index	0.01

Predictability of load/store instruction latencies

Quantified Score

Visualization

Abstract