Performance evaluation of memory consistency models for shared-memory multiprocessors
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Tolerating latency through software-controlled prefetching in shared-memory multiprocessors
Journal of Parallel and Distributed Computing - Special issue on shared-memory multiprocessors
Data prefetching in multiprocessor vector cache memories
ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Comparative evaluation of latency reducing and tolerating techniques
ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
An effective on-chip preloading scheme to reduce data access penalty
Proceedings of the 1991 ACM/IEEE conference on Supercomputing
SPLASH: Stanford parallel applications for shared-memory
ACM SIGARCH Computer Architecture News
Prefetch unit for vector operations on scalar computers
ACM SIGARCH Computer Architecture News
Stride directed prefetching in scalar processors
MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Combined performance gains of simple cache protocol extensions
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
A performance study of software and hardware data prefetching schemes
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Tolerating latency through software-controlled data prefetching
Tolerating latency through software-controlled data prefetching
Lockup-free instruction fetch/prefetch cache organization
ISCA '81 Proceedings of the 8th annual symposium on Computer Architecture
Effectiveness of hardware-based stride and sequential prefetching in shared-memory multiprocessors
HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
Highly accurate data value prediction using hybrid predictors
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Hardware-driven prefetching for pointer data references
ICS '98 Proceedings of the 12th international conference on Supercomputing
Predicting the performance of distributed virtual shared-memory applications
IBM Systems Journal
IEEE Transactions on Computers
Cyclic dependence based data reference prediction
ICS '99 Proceedings of the 13th international conference on Supercomputing
Proceedings of the 27th annual international symposium on Computer architecture
Improving Memory Traffic by Assembly-Level Exploitation of Reuses for Vector Registers
The Journal of Supercomputing
PLDI '02 Proceedings of the ACM SIGPLAN 2002 Conference on Programming language design and implementation
Profile-guided post-link stride prefetching
ICS '02 Proceedings of the 16th international conference on Supercomputing
DSTRIDE: data-cache miss-address-based stride prefetching scheme for multimedia processors
ACSAC '01 Proceedings of the 6th Australasian conference on Computer systems architecture
Cache-Only Memory Architectures
Computer
Combining Loop Fusion with Prefetching on Shared-memory Multiprocessors
ICPP '97 Proceedings of the international Conference on Parallel Processing
IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
Value-Profile Guided Stride Prefetching for Irregular Code
CC '02 Proceedings of the 11th International Conference on Compiler Construction
Fighting the memory wall with assisted execution
Proceedings of the 1st conference on Computing frontiers
Merging, sorting and matrix operations on the SOME-bus multiprocessor architecture
Future Generation Computer Systems - Special issue: Advanced services for clusters and internet computing
An Efficient Value Predictor Dynamically Using Loop and Locality Properties
The Journal of Supercomputing
Optimal multistream sequential prefetching in a shared cache
ACM Transactions on Storage (TOS)
Low-Cost Adaptive Data Prefetching
Euro-Par '08 Proceedings of the 14th international Euro-Par conference on Parallel Processing
Memory resource allocation for file system prefetching: from a supply chain management perspective
Proceedings of the 4th ACM European conference on Computer systems
Conjugate gradient sparse solvers: performance-power characteristics
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
HiPEAC'10 Proceedings of the 5th international conference on High Performance Embedded Architectures and Compilers
Dual-addressing memory architecture for two-dimensional memory access patterns
Proceedings of the Conference on Design, Automation and Test in Europe
Hi-index | 0.01 |
We study the efficiency of previously proposed stride and sequential prefetching驴two promising hardware-based prefetching schemes to reduce read-miss penalties in shared-memory multiprocessors. Although stride accesses dominate in four out of six of the applications we study, we find that sequential prefetching does as well as and in same cases even better than stride prefetching for five applications. This is because 1) most strides are shorter than the block size (we assume 32 byte blocks), which means that sequential prefetching is as effective for these stride accesses, and 2) sequential prefetching also exploits the locality of read misses with nonstride accesses. However, since stride prefetching in general results in fewer useless prefetches, it offers the extra advantage of consuming less memory-system bandwidth.