Zero loads: canceling load requests by tracking zero values

Authors:
Mafijul Md. Islam;Per Stenstrom
Affiliations:
Chalmers University of Technology, Goteborg, Sweden;Chalmers University of Technology, Goteborg, Sweden
Venue:
Proceedings of the 9th workshop on MEmory performance: DEaling with Applications, systems and architecture
Year:
2008

Citing 27
Cited 1

Value locality and load value prediction

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Dynamic instruction reuse

Proceedings of the 24th annual international symposium on Computer architecture
The filter cache: an energy efficient memory structure

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Correlated load-address predictors

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Wattch: a framework for architectural-level power analysis and optimizations

Proceedings of the 27th annual international symposium on Computer architecture
Dynamic zero compression for cache energy reduction

Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Load and store reuse using register file contents

ICS '01 Proceedings of the 15th international conference on Supercomputing
Locality vs. criticality

ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Energy-efficient load and store reuse

ISLPED '01 Proceedings of the 2001 international symposium on Low power electronics and design
Silent Stores and Store Value Locality

IEEE Transactions on Computers
Frequent value locality and its applications

ACM Transactions on Embedded Computing Systems (TECS)
Automatically characterizing large scale program behavior

Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
The MIPS R10000 Superscalar Microprocessor

IEEE Micro
Energy efficient frequent value data cache design

Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Hybridizing and Coalescing Load Value Predictors

ICCD '00 Proceedings of the 2000 IEEE International Conference on Computer Design: VLSI in Computers & Processors
Reducing data cache energy consumption via cached load/store queue

Proceedings of the 2003 international symposium on Low power electronics and design
Adaptive Cache Compression for High-Performance Processors

Proceedings of the 31st annual international symposium on Computer architecture
Niagara: A 32-Way Multithreaded Sparc Processor

IEEE Micro
A Robust Main-Memory Compression Scheme

Proceedings of the 32nd annual international symposium on Computer Architecture
Scalable Load and Store Processing in Latency Tolerant Processors

Proceedings of the 32nd annual international symposium on Computer Architecture
Dynamic Zero-Sensitivity Scheme for Low-Power Cache Memories

IEEE Micro
Scalable Store-Load Forwarding via Store Queue Index Prediction

Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Address-Indexed Memory Disambiguation and Store-to-Load Forwarding

Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Characterization of simultaneous multithreading (SMT) efficiency in POWER5

IBM Journal of Research and Development - POWER5 and packaging
Reducing cache traffic and energy with macro data load

Proceedings of the 2006 international symposium on Low power electronics and design
Fire-and-Forget: Load/Store Scheduling with No Store Queue at All

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
NoSQ: Store-Load Communication without a Store Queue

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture

Cancellation of loads that return zero using zero-value caches

Proceedings of the 23rd international conference on Supercomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

The considerable gap between processor and DRAM speed and the power losses in the cache hierarchy calls for more efficient approaches. Broadly speaking, cache-hierarchy efficiency can be increased either by improving cache management or by reducing the number of load instructions that reach the cache hierarchy. We introduce the notion of zero loads to approach the latter. This paper explores the potential of tracking locations that contain the value 'zero'. Loads directed to such locations -- termed Zero Loads -- can be cancelled before they are issued in the cache hierarchy. We find that as many as 21% of the loads are Zero Loads and about one third of them are critical, i.e., ends up on the critical memory path for out-of-order cores. Motivated by this observation, we explore two innovative structures to capture Zero Loads by essentially book-keeping earlier visited blocks/locations that return 'zero'. These schemes are shown to be capable of improving performance and power/energy efficiency considerably.