An adaptive sequential prefetching scheme in shared-memory multiprocessors
ICPP '97 Proceedings of the international Conference on Parallel Processing
Sequential Unification and Aggressive Lookahead Mechanisms for Data Memory Accesses
PaCT '999 Proceedings of the 5th International Conference on Parallel Computing Technologies
Compiler-Directed Cache Assist Adaptivity
ISHPC '00 Proceedings of the Third International Symposium on High Performance Computing
Effectiveness of hardware-based stride and sequential prefetching in shared-memory multiprocessors
HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
Reducing Cache Pollution via Dynamic Data Prefetch Filtering
IEEE Transactions on Computers
Optimal multistream sequential prefetching in a shared cache
ACM Transactions on Storage (TOS)
Improving SDRAM access energy efficiency for low-power embedded systems
ACM Transactions on Embedded Computing Systems (TECS)
Data access history cache and associated data prefetching mechanisms
Proceedings of the 2007 ACM/IEEE conference on Supercomputing
TaP: table-based prefetching for storage caches
FAST'08 Proceedings of the 6th USENIX Conference on File and Storage Technologies
Server-based data push architecture for multi-processor environments
Journal of Computer Science and Technology
Low-Cost Adaptive Data Prefetching
Euro-Par '08 Proceedings of the 14th international Euro-Par conference on Parallel Processing
A compiler-directed data prefetching scheme for chip multiprocessors
Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
Revisiting Cache Block Superloading
HiPEAC '09 Proceedings of the 4th International Conference on High Performance Embedded Architectures and Compilers
Coordinated control of multiple prefetchers in multi-core systems
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Inter-core cooperative TLB for chip multiprocessors
Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Timing local streams: improving timeliness in data prefetching
Proceedings of the 24th ACM International Conference on Supercomputing
An Adaptive Data Prefetcher for High-Performance Processors
CCGRID '10 Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing
Adaptive prefetching for shared cache based chip multiprocessors
Proceedings of the Conference on Design, Automation and Test in Europe
Improving cache locality for thread-level speculation
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
A cost-intelligent application-specific data layout scheme for parallel file systems
Proceedings of the 20th international symposium on High performance distributed computing
Bandwidth constrained coordinated HW/SW prefetching for multicores
Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part I
Global-aware and multi-order context-based prefetching for high-performance processors
International Journal of High Performance Computing Applications
ABS: A low-cost adaptive controller for prefetching in a banked shared last-level cache
ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
An effective instruction cache prefetch policy by exploiting cache history information
EUC'05 Proceedings of the 2005 international conference on Embedded and Ubiquitous Computing
Application-Specific hardware-driven prefetching to improve data cache performance
ACSAC'05 Proceedings of the 10th Asia-Pacific conference on Advances in Computer Systems Architecture
Hi-index | 0.00 |
To offset the effect of read miss penalties on processor utilization in shared-memory multiprocessors, several software- and hardware-based data prefetching schemes have been proposed. A major advantage of hardware tech niques is that they need no support from the programmer or compiler. Sequential prefetching is a simple hardware-controlled prefetching technique which relies on the automatic prefetch of consecutive blocks following the block that misses in the cache. In its simplest form, the number of prefetched blocks on each miss is fixed throughout the exe cution. However, since the prefetching efficiency varies during the execution of a program, we propose to adapt the number of pref etched blocks according to a dynamic measure of prefetching effectiveness. Simulations of this adaptive scheme show significant reductions of the read penalty and of the overall execution time.