ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Tolerating latency through software-controlled prefetching in shared-memory multiprocessors
Journal of Parallel and Distributed Computing - Special issue on shared-memory multiprocessors
Data prefetching in multiprocessor vector cache memories
ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Comparative evaluation of latency reducing and tolerating techniques
ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
An effective on-chip preloading scheme to reduce data access penalty
Proceedings of the 1991 ACM/IEEE conference on Supercomputing
SPLASH: Stanford parallel applications for shared-memory
ACM SIGARCH Computer Architecture News
Reducing memory latency via non-blocking and prefetching caches
ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Design and evaluation of a compiler algorithm for prefetching
ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Stride directed prefetching in scalar processors
MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
The SPLASH-2 programs: characterization and methodological considerations
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Compiler-directed data prefetching in multiprocessors with memory hierarchies
ICS '90 Proceedings of the 4th international conference on Supercomputing
The directory-based cache coherence protocol for the DASH multiprocessor
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
ACM Computing Surveys (CSUR)
Mint Tutorial and User Manual
Fixed and Adaptive Sequential Prefetching in Shared Memory Multiprocessors
ICPP '93 Proceedings of the 1993 International Conference on Parallel Processing - Volume 01
A New Solution to Coherence Problems in Multicache Systems
IEEE Transactions on Computers
Adapting cache line size to application behavior
ICS '99 Proceedings of the 13th international conference on Supercomputing
Optimizing Overall Loop Schedules Using Prefetching and Partitioning
IEEE Transactions on Parallel and Distributed Systems
Minimizing Average Schedule Length under Memory Constraints by Optimal Partitioning and Prefetching
Journal of VLSI Signal Processing Systems
Loop Scheduling and Partitions for Hiding Memory Latencies
Proceedings of the 12th international symposium on System synthesis
Loop Scheduling with Complete Memory Latency Hiding on Multi-core Architecture
ICPADS '06 Proceedings of the 12th International Conference on Parallel and Distributed Systems - Volume 1
Partitioning and scheduling DSP applications with maximal memory access hiding
EURASIP Journal on Applied Signal Processing
Optimal multistream sequential prefetching in a shared cache
ACM Transactions on Storage (TOS)
TaP: table-based prefetching for storage caches
FAST'08 Proceedings of the 6th USENIX Conference on File and Storage Technologies
Iterational retiming with partitioning: Loop scheduling with complete memory latency hiding
ACM Transactions on Embedded Computing Systems (TECS)
ABS: A low-cost adaptive controller for prefetching in a banked shared last-level cache
ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
Hi-index | 0.00 |
The sequential prefetching scheme is a simple hardware controlled scheme, which exploits the sequentiality of memory accesses to predict which blocks will be read in the near future. We analyze the relationship between the sequentiality of application programs and the effectiveness of sequential prefetching on shared-memory multiprocessors. Also, we propose a simple hardware scheme which selects the prefetching degree on each miss by adding a small table (PDS: Prefetching Degree Selector) to the sequential prefetching scheme. This scheme could prefetch consecutive blocks aggressively for applications with high sequentiality and conservatively for applications with low sequentiality.