Effective jump-pointer prefetching for linked data structures

Authors:
Amir Roth;Gurindar S. Sohi
Affiliations:
Computer Sciences Department, University of Wisconsin, Madison;Computer Sciences Department, University of Wisconsin, Madison
Venue:
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Year:
1999

Citing 16
Cited 57

Data parallel algorithms

Communications of the ACM - Special issue on parallelism
Skip lists: a probabilistic alternative to balanced trees

Communications of the ACM
Introduction to algorithms

Introduction to algorithms
Supporting dynamic data structures on distributed-memory machines

ACM Transactions on Programming Languages and Systems (TOPLAS)
SPAID: software prefetching in pointer- and call-intensive environments

Proceedings of the 28th annual international symposium on Microarchitecture
Compiler-based prefetching for recursive data structures

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Examination of a memory access classification scheme for pointer-intensive and numeric programs

ICS '96 Proceedings of the 10th international conference on Supercomputing
Exploiting dead value information

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Using generational garbage collection to implement cache-conscious data placement

Proceedings of the 1st international symposium on Memory management
Segregating heap objects by reference behavior and lifetime

Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Dependence based prefetching for linked data structures

Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Cache-conscious data placement

Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Cache-conscious structure layout

Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
A nonrecursive list compacting algorithm

Communications of the ACM
A LISP garbage-collector for virtual-memory computer systems

Communications of the ACM
Data Structure Techniques

Data Structure Techniques

Run-Time Cache Bypassing

IEEE Transactions on Computers
Push vs. pull: data movement for linked data structures

Proceedings of the 14th international conference on Supercomputing
Understanding the backward slices of performance degrading instructions

Proceedings of the 27th annual international symposium on Computer architecture
Instruction path coprocessors

Proceedings of the 27th annual international symposium on Computer architecture
Predictor-directed stream buffers

Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
PipeRench implementation of the instruction path coprocessor

Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Evaluating the impact of memory system performance on software prefetching and locality optimizations

ICS '01 Proceedings of the 15th international conference on Supercomputing
Tolerating memory latency through software-controlled pre-execution in simultaneous multithreading processors

ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Software caching vs. prefetching

Proceedings of the 3rd international symposium on Memory management
Dynamic hot data stream prefetching for general-purpose programs

PLDI '02 Proceedings of the ACM SIGPLAN 2002 Conference on Programming language design and implementation
Efficient discovery of regular stride patterns in irregular programs and its use in compiler prefetching

PLDI '02 Proceedings of the ACM SIGPLAN 2002 Conference on Programming language design and implementation
Profile-guided post-link stride prefetching

ICS '02 Proceedings of the 16th international conference on Supercomputing
Difficult-path branch prediction using subordinate microthreads

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Instruction-Level Distributed Processing

Computer
A Decoupled Predictor-Directed Stream Prefetching Architecture

IEEE Transactions on Computers
Instruction Level Distributed Processing

HiPC '00 Proceedings of the 7th International Conference on High Performance Computing
Instruction Level Distributed Processing: Adapting to Future Technology

ISHPC '00 Proceedings of the Third International Symposium on High Performance Computing
A Programmable Memory Hierarchy for Prefetching Linked Data Structures

ISHPC '02 Proceedings of the 4th International Symposium on High Performance Computing
Value-Profile Guided Stride Prefetching for Irregular Code

CC '02 Proceedings of the 11th International Conference on Compiler Construction
Content-Based Prefetching: Initial Results

IMS '00 Revised Papers from the Second International Workshop on Intelligent Memory Systems
Automatic pool allocation for disjoint data structures

Proceedings of the 2002 workshop on Memory system performance
Pointer cache assisted prefetching

Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Guided region prefetching: a cooperative hardware/software approach

Proceedings of the 30th annual international symposium on Computer architecture
The Performance of Runtime Data Cache Prefetching in a Dynamic Optimization System

Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
A Prefetch Taxonomy

IEEE Transactions on Computers
A general framework for prefetch scheduling in linked data structures and its application to multi-chain prefetching

ACM Transactions on Computer Systems (TOCS)
Prefetch injection based on hardware monitoring and object metadata

Proceedings of the ACM SIGPLAN 2004 conference on Programming language design and implementation
Software prefetching for mark-sweep garbage collection: hardware analysis and software redesign

ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
Decoupled Software Pipelining with the Synchronization Array

Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
Tolerating memory latency through push prefetching for pointer-intensive applications

ACM Transactions on Architecture and Code Optimization (TACO)
Exploring the limits of prefetching

IBM Journal of Research and Development - Electrochemical technology in microelectronics
Memory-side prefetching for linked data structures for processor-in-memory systems

Journal of Parallel and Distributed Computing
PARE: a power-aware hardware data prefetching engine

ISLPED '05 Proceedings of the 2005 international symposium on Low power electronics and design
On the performance of trace locality of reference

Performance Evaluation - Performance modelling and evaluation of high-performance parallel and distributed systems
Design and Implementation of a Compiler Framework for Helper Threading on Multi-core Processors

Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Address-Value Delta (AVD) Prediction: Increasing the Effectiveness of Runahead Execution by Exploiting Regular Memory Allocation Patterns

Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Load squared: adding logic close to memory to reduce the latency of indirect loads with high miss ratios

MEDEA '04 Proceedings of the 2004 workshop on MEmory performance: DEaling with Applications , systems and architecture
Accelerating database operators using a network processor

DaMoN '05 Proceedings of the 1st international workshop on Data management on new hardware
A Self-Repairing Prefetcher in an Event-Driven Dynamic Optimization Framework

Proceedings of the International Symposium on Code Generation and Optimization
Intelligent memory manager: reducing cache pollution due to memory management functions

Journal of Systems Architecture: the EUROMICRO Journal
Decomposing memory performance: data structures and phases

Proceedings of the 5th international symposium on Memory management
Accelerating sequential programs on Chip Multiprocessors via Dynamic Prefetching Thread

Microprocessors & Microsystems
HAT-trie: a cache-conscious trie-based data structure for strings

ACSC '07 Proceedings of the thirtieth Australasian conference on Computer science - Volume 62
Load squared: Adding logic close to memory to reduce the latency of indirect loads in embedded and general systems

Journal of Embedded Computing - Embeded Processors and Systems: Architectural Issues and Solutions for Emerging Applications
PFetch: software prefetching exploiting temporal predictability of memory access streams

Proceedings of the 9th workshop on MEmory performance: DEaling with Applications, systems and architecture
Multiprocessor System-on-Chip designs with active memory processors for higher memory efficiency

Proceedings of the 46th Annual Design Automation Conference
Engineering scalable, cache and space efficient tries for strings

The VLDB Journal — The International Journal on Very Large Data Bases
Redesigning the string hash table, burst trie, and BST to exploit cache

Journal of Experimental Algorithmics (JEA)
Template-based memory access engine for accelerators in SoCs

Proceedings of the 16th Asia and South Pacific Design Automation Conference
DRAM energy reduction by prefetching-based memory traffic clustering

Proceedings of the 21st edition of the great lakes symposium on Great lakes symposium on VLSI
Energy-efficient hardware data prefetching

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
When Prefetching Works, When It Doesn’t, and Why

ACM Transactions on Architecture and Code Optimization (TACO)
Energy-aware data prefetching for general-purpose programs

PACS'04 Proceedings of the 4th international conference on Power-Aware Computer Systems
Cache-Conscious collision resolution in string hash tables

SPIRE'05 Proceedings of the 12th international conference on String Processing and Information Retrieval
PPMC: a programmable pattern based memory controller

ARC'12 Proceedings of the 8th international conference on Reconfigurable Computing: architectures, tools and applications
Pointy: a hybrid pointer prefetcher for managed runtime systems

Proceedings of the 21st international conference on Parallel architectures and compilation techniques
Linearizing irregular memory accesses for improved correlated prefetching

Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture

Quantified Score

Hi-index	0.01

Visualization

Abstract

Current techniques for prefetching linked data structures (LDS) exploit the work available in one loop iteration or recursive call to overlap pointer chasing latency. Jump pointers, which provide direct access to non-adjacent nodes, can be used for prefetching when loop and recursive procedure bodies are small and do not have sufficient work to overlap a long latency. This paper describes a framework for jump-pointer prefetching (JPP) that supports four prefetching idioms: queue, full, chain, and root jumping and three implementations: software-only, hardware-only, and a cooperative software/hardware technique. On a suite of pointer intensive programs, jump pointer prefetching reduces memory stall time by 72% for software, 83% for cooperative and 55% for hardware, producing speedups of 15%, 20% and 22% respectively.