Compilers: principles, techniques, and tools
Compilers: principles, techniques, and tools
Memory coherence in shared virtual memory systems
ACM Transactions on Computer Systems (TOCS)
Tolerating latency through software-controlled prefetching in shared-memory multiprocessors
Journal of Parallel and Distributed Computing - Special issue on shared-memory multiprocessors
SPLASH: Stanford parallel applications for shared-memory
ACM SIGARCH Computer Architecture News
Design and evaluation of a compiler algorithm for prefetching
ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
A performance study of software and hardware data prefetching schemes
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Tolerating latency through software-controlled data prefetching
Tolerating latency through software-controlled data prefetching
An effective programmable prefetch engine for on-chip caches
Proceedings of the 28th annual international symposium on Microarchitecture
Informing memory operations: providing memory performance feedback in modern processors
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Compiler-based prefetching for recursive data structures
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Portable Programs for Parallel Processors
Portable Programs for Parallel Processors
Sequential Hardware Prefetching in Shared-Memory Multiprocessors
IEEE Transactions on Parallel and Distributed Systems
Informing Loads: Enabling Software to Observe and React to Memory Behavior
Informing Loads: Enabling Software to Observe and React to Memory Behavior
A New Solution to Coherence Problems in Multicache Systems
IEEE Transactions on Computers
Optimizing Overall Loop Schedules Using Prefetching and Partitioning
IEEE Transactions on Parallel and Distributed Systems
Array allocation taking into account SDRAM characteristics
ASP-DAC '00 Proceedings of the 2000 Asia and South Pacific Design Automation Conference
Loop Scheduling and Partitions for Hiding Memory Latencies
Proceedings of the 12th international symposium on System synthesis
Guided region prefetching: a cooperative hardware/software approach
Proceedings of the 30th annual international symposium on Computer architecture
Fighting the memory wall with assisted execution
Proceedings of the 1st conference on Computing frontiers
On the performance of trace locality of reference
Performance Evaluation - Performance modelling and evaluation of high-performance parallel and distributed systems
Iterational retiming with partitioning: Loop scheduling with complete memory latency hiding
ACM Transactions on Embedded Computing Systems (TECS)
An Adaptive Data Prefetcher for High-Performance Processors
CCGRID '10 Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing
Tackling cache-line stealing effects using run-time adaptation
LCPC'10 Proceedings of the 23rd international conference on Languages and compilers for parallel computing
Hi-index | 0.00 |
We propose and evaluate a new data prefetching technique for cache coherent multiprocessors. Prefetches are issued by a prefetch engine which is controlled by the compiler. Second-level cache misses generate cache miss traps, and start the prefetch engine in a trap handler generated by the compiler. The only instruction overhead in our approach is when a trap handler terminates after data arrives. We present the functionality of the prefetch engine and a compiler algorithm to control it. We also study emulation of the prefetch engine in software. Our techniques are evaluated on six parallel applications using a compiler which incorporates our algorithm and a simulated multiprocessor. The prefetch engines remove up to 67% of the memory access stall time at an instruction overhead less than 0.42%. The emulated prefetch engines remove in general less stall time at a higher instruction overhead.