Spatio-temporal memory streaming

Authors:
Stephen Somogyi;Thomas F. Wenisch;Anastasia Ailamaki;Babak Falsafi
Affiliations:
Carnegie Mellon University, Pittsburgh, PA, USA;University of Michigan, Ann Arbor, MI, USA;Ecole Polytechnique Federale de Lausanne, Lausanne, Switzerland;Ecole Polytechnique Federale de Lausanne, Lausanne, Switzerland
Venue:
Proceedings of the 36th annual international symposium on Computer architecture
Year:
2009

Citing 27
Cited 16

Prefetching using Markov predictors

Proceedings of the 24th annual international symposium on Computer architecture
An analysis of database workload performance on simultaneous multithreaded processors

Proceedings of the 25th annual international symposium on Computer architecture
Exploiting spatial locality in data caches using spatial footprints

Proceedings of the 25th annual international symposium on Computer architecture
Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Predictor-directed stream buffers

Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Dynamic hot data stream prefetching for general-purpose programs

PLDI '02 Proceedings of the ACM SIGPLAN 2002 Conference on Programming language design and implementation
Using a user-level memory thread for correlation prefetching

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
DBMSs on a Modern Processor: Where Does Time Go?

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
The Memory Performance of DSS Commercial Workloads in Shared-Memory Multiprocessors

HPCA '97 Proceedings of the 3rd IEEE Symposium on High-Performance Computer Architecture
SMARTS: accelerating microarchitecture simulation via rigorous statistical sampling

Proceedings of the 30th annual international symposium on Computer architecture
Scaling and Charact rizing Database Workloads: Bridging the Gap between Research and Practice

Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Temporal Streaming of Shared Memory

Proceedings of the 32nd annual international symposium on Computer Architecture
Data Cache Prefetching Using a Global History Buffer

HPCA '04 Proceedings of the 10th International Symposium on High Performance Computer Architecture
Accurate and Complexity-Effective Spatial Pattern Prediction

HPCA '04 Proceedings of the 10th International Symposium on High Performance Computer Architecture
DBmbench: fast and accurate database workload representation on modern microarchitecture

CASCON '05 Proceedings of the 2005 conference of the Centre for Advanced Studies on Collaborative research
A Self-Repairing Prefetcher in an Event-Driven Dynamic Optimization Framework

Proceedings of the International Symposium on Code Generation and Optimization
Spatial Memory Streaming

Proceedings of the 33rd annual international symposium on Computer Architecture
SimFlex: Statistical Sampling of Computer System Simulation

IEEE Micro
Stealth prefetching

Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Memory Prefetching Using Adaptive Stream Detection

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Mechanisms for store-wait-free multiprocessors

Proceedings of the 34th annual international symposium on Computer architecture
Accelerating and Adapting Precomputation Threads for Effcient Prefetching

HPCA '07 Proceedings of the 2007 IEEE 13th International Symposium on High Performance Computer Architecture
Low-Cost Epoch-Based Correlation Prefetching for Commercial Applications

Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
Predictor virtualization

Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Temporal instruction fetch streaming

Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Stream chaining: exploiting multiple levels of correlation in data prefetching

Proceedings of the 36th annual international symposium on Computer architecture
Identifying hierarchical structure in sequences: a linear-time algorithm

Journal of Artificial Intelligence Research

Stream chaining: exploiting multiple levels of correlation in data prefetching

Proceedings of the 36th annual international symposium on Computer architecture
Machine learning-based prefetch optimization for data center applications

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Timing local streams: improving timeliness in data prefetching

Proceedings of the 24th ACM International Conference on Supercomputing
Data marshaling for multi-core architectures

Proceedings of the 37th annual international symposium on Computer architecture
A memory interface for multi-purpose multi-stream accelerators

CASES '10 Proceedings of the 2010 international conference on Compilers, architectures and synthesis for embedded systems
Data-oriented transaction execution

Proceedings of the VLDB Endowment
PLP: page latch-free shared-everything OLTP

Proceedings of the VLDB Endowment
Global-aware and multi-order context-based prefetching for high-performance processors

International Journal of High Performance Computing Applications
ABS: A low-cost adaptive controller for prefetching in a banked shared last-level cache

ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
Proactive instruction fetch

Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Unified memory optimizing architecture: memory subsystem control with a unified predictor

Proceedings of the 26th ACM international conference on Supercomputing
Algorithm-level Feedback-controlled Adaptive data prefetcher: Accelerating data access for high-performance processors

Parallel Computing
Scalable and dynamically balanced shared-everything OLTP with physiological partitioning

The VLDB Journal — The International Journal on Very Large Data Bases
Linearizing irregular memory accesses for improved correlated prefetching

Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
SHIFT: shared history instruction fetch for lean-core server processors

Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
Large-reach memory management unit caches

Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture

Quantified Score

Hi-index	0.00

Visualization

Abstract

Recent research advocates memory streaming techniques to alleviate the performance bottleneck caused by the high latencies of off-chip memory accesses. Temporal memory streaming replays previously observed miss sequences to eliminate long chains of dependent misses. Spatial memory streaming predicts repetitive data layout patterns within fixed-size memory regions. Because each technique targets a different subset of misses, their effectiveness varies across workloads and each leaves a significant fraction of misses unpredicted. In this paper, we propose Spatio-Temporal Memory Streaming (STeMS) to exploit the synergy between spatial and temporal streaming. We observe that the order of spatial accesses repeats both within and across regions. STeMS records and replays the temporal sequence of region accesses and uses spatial relationships within each region to dynamically reconstruct a predicted total miss order. Using trace-driven and cycle-accurate simulation across a suite of commercial workloads, we demonstrate that with similar implementation complexity as temporal streaming, STeMS achieves equal or higher coverage than spatial or temporal memory streaming alone, and improves performance by 31%, 3%, and 18% over stride, spatial, and temporal prediction, respectively.