On the value locality of store instructions

Authors:
Kevin M. Lepak;Mikko H. Lipasti
Affiliations:
Electrical and Computer Engineering, University of Wisconsin, 1415 Engineering Drive, Madison, WI;Electrical and Computer Engineering, University of Wisconsin, 1415 Engineering Drive, Madison, WI
Venue:
Proceedings of the 27th annual international symposium on Computer architecture
Year:
2000

Citing 10
Cited 25

Computer architecture: a quantitative approach

Computer architecture: a quantitative approach
A variable instruction stream extension to the VLIW architecture

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
The detection and elimination of useless misses in multiprocessors

ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Internal organization of the Alpha 21164, a 300-MHz 64-bit quad-issue CMOS RISC microprocessor

Digital Technical Journal - Special 10th anniversary issue
Exceeding the dataflow limit via value prediction

Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Value profiling

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Selective value prediction

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Reducing Memory Traffic Via Redundant Store Instructions

HPCN Europe '99 Proceedings of the 7th International Conference on High-Performance Computing and Networking
Control-Flow Speculation through Value Prediction for Superscalar Processors

PACT '99 Proceedings of the 1999 International Conference on Parallel Architectures and Compilation Techniques
Memory dependence prediction

Memory dependence prediction

Slipstream processors: improving both performance and fault tolerance

ACM SIGPLAN Notices
Silent stores for free

Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
A study of slipstream processors

Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Compiler controlled value prediction using branch predictor based confidence

Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Load and store reuse using register file contents

ICS '01 Proceedings of the 15th international conference on Supercomputing
Slipstream processors: improving both performance and fault tolerance

ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Better exploration of region-level value locality with integrated computation reuse and value prediction

ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Silent Stores and Store Value Locality

IEEE Transactions on Computers
Implementing optimizations at decode time

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Speculative lock elision: enabling highly concurrent multithreaded execution

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Correctly implementing value prediction in microprocessors that support multithreading or multiprocessing

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Temporally silent stores

Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Dynamic dead-instruction detection and elimination

Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
A Simple Mechanism for Detecting Ineffectual Instructions in Slipstream Processors

IEEE Transactions on Computers
Memory Ordering: A Value-Based Approach

Proceedings of the 31st annual international symposium on Computer architecture
Store Vulnerability Window (SVW): Re-Execution Filtering for Enhanced Load Optimization

Proceedings of the 32nd annual international symposium on Computer Architecture
Understanding prediction-based partial redundant threading for low-overhead, high- coverage fault tolerance

Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
A durable and energy efficient main memory using phase change memory technology

Proceedings of the 36th annual international symposium on Computer architecture
Energy reduction for STT-RAM using early write termination

Proceedings of the 2009 International Conference on Computer-Aided Design
Security refresh: prevent malicious wear-out and increase durability for phase-change memory with dynamically randomized address mapping

Proceedings of the 37th annual international symposium on Computer architecture
Exploring the potential of architecture-level power optimizations

PACS'03 Proceedings of the Third international conference on Power - Aware Computer Systems
HiRe: using hint & release to improve synchronization of speculative threads

Proceedings of the 26th ACM international conference on Supercomputing
Edge chasing delayed consistency: pushing the limits of weak memory models

Proceedings of the 2012 ACM workshop on Relaxing synchronization for multicore and manycore scalability
On-chip caches built on multilevel spin-transfer torque RAM cells and its optimizations

ACM Journal on Emerging Technologies in Computing Systems (JETC) - Special issue on memory technologies
VSwapper: a memory swapper for virtualized environments

Proceedings of the 19th international conference on Architectural support for programming languages and operating systems

Quantified Score

Hi-index	0.01

Visualization

Abstract

Value locality, a recently discovered program attribute that describes the likelihood of the recurrence of previously-seen program values, has been studied enthusiastically in the recent published literature. Much of the energy has focused on refining the initial efforts at predicting load instruction outcomes, with the balance of the effort examining the value locality of either all register-writing instructions, or a focused subset of them. Surprisingly, there has been very little published characterization of or effort to exploit the value locality of data words stored to memory by computer programs. This paper presents such a characterization, proposes both memory-centric (based on message passing) and producer-centric (based on program structure) prediction mechanisms for stored data values, introduces the concept of silent stores and new definitions of multiprocessor false sharing based on these observations, and suggests new techniques for aligning cache coherence protocols and microarchitectural store handling techniques to exploit the value locality of stores. We find that realistic implementations of these techniques can significantly reduce multiprocessor data bus traffic and are more effective at reducing address bus traffic than the addition of Exclusive state to a MSI coherence protocol. We also show that squashing of silent stores can provide uniprocessor speedups greater than the addition of store-to-load forwarding.