Communications of the ACM - Special issue on parallelism
Skip lists: a probabilistic alternative to balanced trees
Communications of the ACM
Introduction to algorithms
Supporting dynamic data structures on distributed-memory machines
ACM Transactions on Programming Languages and Systems (TOPLAS)
SPAID: software prefetching in pointer- and call-intensive environments
Proceedings of the 28th annual international symposium on Microarchitecture
Compiler-based prefetching for recursive data structures
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Examination of a memory access classification scheme for pointer-intensive and numeric programs
ICS '96 Proceedings of the 10th international conference on Supercomputing
Exploiting dead value information
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Using generational garbage collection to implement cache-conscious data placement
Proceedings of the 1st international symposium on Memory management
Segregating heap objects by reference behavior and lifetime
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Dependence based prefetching for linked data structures
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Cache-conscious data placement
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Cache-conscious structure layout
Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
A nonrecursive list compacting algorithm
Communications of the ACM
A LISP garbage-collector for virtual-memory computer systems
Communications of the ACM
Data Structure Techniques
IEEE Transactions on Computers
Push vs. pull: data movement for linked data structures
Proceedings of the 14th international conference on Supercomputing
Understanding the backward slices of performance degrading instructions
Proceedings of the 27th annual international symposium on Computer architecture
Proceedings of the 27th annual international symposium on Computer architecture
Predictor-directed stream buffers
Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
PipeRench implementation of the instruction path coprocessor
Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
ICS '01 Proceedings of the 15th international conference on Supercomputing
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Software caching vs. prefetching
Proceedings of the 3rd international symposium on Memory management
Dynamic hot data stream prefetching for general-purpose programs
PLDI '02 Proceedings of the ACM SIGPLAN 2002 Conference on Programming language design and implementation
PLDI '02 Proceedings of the ACM SIGPLAN 2002 Conference on Programming language design and implementation
Profile-guided post-link stride prefetching
ICS '02 Proceedings of the 16th international conference on Supercomputing
Difficult-path branch prediction using subordinate microthreads
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
A Decoupled Predictor-Directed Stream Prefetching Architecture
IEEE Transactions on Computers
Instruction Level Distributed Processing
HiPC '00 Proceedings of the 7th International Conference on High Performance Computing
Instruction Level Distributed Processing: Adapting to Future Technology
ISHPC '00 Proceedings of the Third International Symposium on High Performance Computing
A Programmable Memory Hierarchy for Prefetching Linked Data Structures
ISHPC '02 Proceedings of the 4th International Symposium on High Performance Computing
Value-Profile Guided Stride Prefetching for Irregular Code
CC '02 Proceedings of the 11th International Conference on Compiler Construction
Content-Based Prefetching: Initial Results
IMS '00 Revised Papers from the Second International Workshop on Intelligent Memory Systems
Automatic pool allocation for disjoint data structures
Proceedings of the 2002 workshop on Memory system performance
Pointer cache assisted prefetching
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Guided region prefetching: a cooperative hardware/software approach
Proceedings of the 30th annual international symposium on Computer architecture
The Performance of Runtime Data Cache Prefetching in a Dynamic Optimization System
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
IEEE Transactions on Computers
ACM Transactions on Computer Systems (TOCS)
Prefetch injection based on hardware monitoring and object metadata
Proceedings of the ACM SIGPLAN 2004 conference on Programming language design and implementation
Software prefetching for mark-sweep garbage collection: hardware analysis and software redesign
ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
Decoupled Software Pipelining with the Synchronization Array
Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
Tolerating memory latency through push prefetching for pointer-intensive applications
ACM Transactions on Architecture and Code Optimization (TACO)
Exploring the limits of prefetching
IBM Journal of Research and Development - Electrochemical technology in microelectronics
Memory-side prefetching for linked data structures for processor-in-memory systems
Journal of Parallel and Distributed Computing
PARE: a power-aware hardware data prefetching engine
ISLPED '05 Proceedings of the 2005 international symposium on Low power electronics and design
On the performance of trace locality of reference
Performance Evaluation - Performance modelling and evaluation of high-performance parallel and distributed systems
Design and Implementation of a Compiler Framework for Helper Threading on Multi-core Processors
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
MEDEA '04 Proceedings of the 2004 workshop on MEmory performance: DEaling with Applications , systems and architecture
Accelerating database operators using a network processor
DaMoN '05 Proceedings of the 1st international workshop on Data management on new hardware
A Self-Repairing Prefetcher in an Event-Driven Dynamic Optimization Framework
Proceedings of the International Symposium on Code Generation and Optimization
Intelligent memory manager: reducing cache pollution due to memory management functions
Journal of Systems Architecture: the EUROMICRO Journal
Decomposing memory performance: data structures and phases
Proceedings of the 5th international symposium on Memory management
Accelerating sequential programs on Chip Multiprocessors via Dynamic Prefetching Thread
Microprocessors & Microsystems
HAT-trie: a cache-conscious trie-based data structure for strings
ACSC '07 Proceedings of the thirtieth Australasian conference on Computer science - Volume 62
Journal of Embedded Computing - Embeded Processors and Systems: Architectural Issues and Solutions for Emerging Applications
PFetch: software prefetching exploiting temporal predictability of memory access streams
Proceedings of the 9th workshop on MEmory performance: DEaling with Applications, systems and architecture
Multiprocessor System-on-Chip designs with active memory processors for higher memory efficiency
Proceedings of the 46th Annual Design Automation Conference
Engineering scalable, cache and space efficient tries for strings
The VLDB Journal — The International Journal on Very Large Data Bases
Redesigning the string hash table, burst trie, and BST to exploit cache
Journal of Experimental Algorithmics (JEA)
Template-based memory access engine for accelerators in SoCs
Proceedings of the 16th Asia and South Pacific Design Automation Conference
DRAM energy reduction by prefetching-based memory traffic clustering
Proceedings of the 21st edition of the great lakes symposium on Great lakes symposium on VLSI
Energy-efficient hardware data prefetching
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
When Prefetching Works, When It Doesn’t, and Why
ACM Transactions on Architecture and Code Optimization (TACO)
Energy-aware data prefetching for general-purpose programs
PACS'04 Proceedings of the 4th international conference on Power-Aware Computer Systems
Cache-Conscious collision resolution in string hash tables
SPIRE'05 Proceedings of the 12th international conference on String Processing and Information Retrieval
PPMC: a programmable pattern based memory controller
ARC'12 Proceedings of the 8th international conference on Reconfigurable Computing: architectures, tools and applications
Pointy: a hybrid pointer prefetcher for managed runtime systems
Proceedings of the 21st international conference on Parallel architectures and compilation techniques
Linearizing irregular memory accesses for improved correlated prefetching
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
Hi-index | 0.01 |
Current techniques for prefetching linked data structures (LDS) exploit the work available in one loop iteration or recursive call to overlap pointer chasing latency. Jump pointers, which provide direct access to non-adjacent nodes, can be used for prefetching when loop and recursive procedure bodies are small and do not have sufficient work to overlap a long latency. This paper describes a framework for jump-pointer prefetching (JPP) that supports four prefetching idioms: queue, full, chain, and root jumping and three implementations: software-only, hardware-only, and a cooperative software/hardware technique. On a suite of pointer intensive programs, jump pointer prefetching reduces memory stall time by 72% for software, 83% for cooperative and 55% for hardware, producing speedups of 15%, 20% and 22% respectively.