A set of level 3 basic linear algebra subprograms
ACM Transactions on Mathematical Software (TOMS)
Tolerating latency through software-controlled prefetching in shared-memory multiprocessors
Journal of Parallel and Distributed Computing - Special issue on shared-memory multiprocessors
Data prefetching in multiprocessor vector cache memories
ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Hitting the memory wall: implications of the obvious
ACM SIGARCH Computer Architecture News
Prefetching using Markov predictors
Proceedings of the 24th annual international symposium on Computer architecture
Dependence based prefetching for linked data structures
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
The Impact of Exploiting Instruction-Level Parallelism on Shared-Memory Multiprocessors
IEEE Transactions on Computers - Special issue on cache memory and related problems
Push vs. pull: data movement for linked data structures
Proceedings of the 14th international conference on Supercomputing
Execution-based prediction using speculative slices
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Data prefetching by dependence graph precomputation
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Post-pass binary adaptation for software-based speculative precomputation
PLDI '02 Proceedings of the ACM SIGPLAN 2002 Conference on Programming language design and implementation
Using a user-level memory thread for correlation prefetching
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Going the distance for TLB prefetching: an application-driven study
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Parallel Computer Architecture: A Hardware/Software Approach
Parallel Computer Architecture: A Hardware/Software Approach
Time Series Analysis: Forecasting and Control
Time Series Analysis: Forecasting and Control
Dynamic speculative precomputation
Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Effective Hardware-Based Data Prefetching for High-Performance Processors
IEEE Transactions on Computers
Sequential Hardware Prefetching in Shared-Memory Multiprocessors
IEEE Transactions on Parallel and Distributed Systems
Branch-Directed and Stride-Based Data Cache Prefetching
ICCD '96 Proceedings of the 1996 International Conference on Computer Design, VLSI in Computers and Processors
Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques
Basic Block Distribution Analysis to Find Periodic Behavior and Simulation Points in Applications
Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques
Decoupled access/execute computer architectures
ISCA '82 Proceedings of the 9th annual symposium on Computer Architecture
Distributed Prefetch-buffer/Cache Design for High Performance Memory Systems
HPCA '96 Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture
Speculative Data-Driven Multithreading
HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
Data forwarding through in-memory precomputation threads
Proceedings of the 18th annual international conference on Supercomputing
When prefetching improves/degrades performance
Proceedings of the 2nd conference on Computing frontiers
Dual-Core Execution: Building a Highly Scalable Single-Thread Instruction Window
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Future Execution: A Hardware Prefetching Technique for Chip Multiprocessors
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Predicting memory-access cost based on data-access patterns
CLUSTER '04 Proceedings of the 2004 IEEE International Conference on Cluster Computing
Computer Architecture, Fourth Edition: A Quantitative Approach
Computer Architecture, Fourth Edition: A Quantitative Approach
Fixed and Adaptive Sequential Prefetching in Shared Memory Multiprocessors
ICPP '93 Proceedings of the 1993 International Conference on Parallel Processing - Volume 01
Timing local streams: improving timeliness in data prefetching
Proceedings of the 24th ACM International Conference on Supercomputing
Hi-index | 0.00 |
Data access delay is a major bottleneck in utilizing current high-end computing (HEC) machines. Prefetching, where data is fetched before CPU demands for it, has been considered as an effective solution to masking data access delay. However, current client-initiated prefetching strategies, where a computing processor initiates prefetching instructions, have many limitations. They do not work well for applications with complex, non-contiguous data access patterns. While technology advances continue to increase the gap between computing and data access performance, trading computing power for reducing data access delay has become a natural choice. In this paper, we present a server-based data-push approach and discuss its associated implementation mechanisms. In the server-push architecture, a dedicated server called Data Push Server (DPS) initiates and proactively pushes data closer to the client in time. Issues, such as what data to fetch, when to fetch, and how to push are studied. The SimpleScalar simulator is modified with a dedicated prefetching engine that pushes data for another processor to test DPS based prefetching. Simulation results show that L1 Cache miss rate can be reduced by up to 97% (71% on average) over a superscalar processor for SPEC CPU2000 benchmarks that have high cache miss rates.