Server-based data push architecture for multi-processor environments

Authors:
Xian-He Sun;Surendra Byna;Yong Chen
Affiliations:
Department of Computer Science, Illinois Institute of Technology, Chicago, Illinois and Computing Division, Fermi National Accelerator Laboratory, Batavia, IL;Department of Computer Science, Illinois Institute of Technology, Chicago, Illinois;Department of Computer Science, Illinois Institute of Technology, Chicago, Illinois
Venue:
Journal of Computer Science and Technology
Year:
2007

Citing 32
Cited 1

A set of level 3 basic linear algebra subprograms

ACM Transactions on Mathematical Software (TOMS)
Tolerating latency through software-controlled prefetching in shared-memory multiprocessors

Journal of Parallel and Distributed Computing - Special issue on shared-memory multiprocessors
Data prefetching in multiprocessor vector cache memories

ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Hitting the memory wall: implications of the obvious

ACM SIGARCH Computer Architecture News
Prefetching using Markov predictors

Proceedings of the 24th annual international symposium on Computer architecture
Dependence based prefetching for linked data structures

Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
The Impact of Exploiting Instruction-Level Parallelism on Shared-Memory Multiprocessors

IEEE Transactions on Computers - Special issue on cache memory and related problems
Push vs. pull: data movement for linked data structures

Proceedings of the 14th international conference on Supercomputing
Execution-based prediction using speculative slices

ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Tolerating memory latency through software-controlled pre-execution in simultaneous multithreading processors

ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Data prefetching by dependence graph precomputation

ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Post-pass binary adaptation for software-based speculative precomputation

PLDI '02 Proceedings of the ACM SIGPLAN 2002 Conference on Programming language design and implementation
Using a user-level memory thread for correlation prefetching

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Going the distance for TLB prefetching: an application-driven study

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Parallel Computer Architecture: A Hardware/Software Approach

Parallel Computer Architecture: A Hardware/Software Approach
Time Series Analysis: Forecasting and Control

Time Series Analysis: Forecasting and Control
Dynamic speculative precomputation

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Effective Hardware-Based Data Prefetching for High-Performance Processors

IEEE Transactions on Computers
Sequential Hardware Prefetching in Shared-Memory Multiprocessors

IEEE Transactions on Parallel and Distributed Systems
Branch-Directed and Stride-Based Data Cache Prefetching

ICCD '96 Proceedings of the 1996 International Conference on Computer Design, VLSI in Computers and Processors
Multi-Chain Prefetching: Effective Exploitation of Inter-Chain Memory Parallelism for Pointer-Chasing Codes

Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques
Basic Block Distribution Analysis to Find Periodic Behavior and Simulation Points in Applications

Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques
Decoupled access/execute computer architectures

ISCA '82 Proceedings of the 9th annual symposium on Computer Architecture
Distributed Prefetch-buffer/Cache Design for High Performance Memory Systems

HPCA '96 Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture
Speculative Data-Driven Multithreading

HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
Data forwarding through in-memory precomputation threads

Proceedings of the 18th annual international conference on Supercomputing
When prefetching improves/degrades performance

Proceedings of the 2nd conference on Computing frontiers
Dual-Core Execution: Building a Highly Scalable Single-Thread Instruction Window

Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Future Execution: A Hardware Prefetching Technique for Chip Multiprocessors

Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Predicting memory-access cost based on data-access patterns

CLUSTER '04 Proceedings of the 2004 IEEE International Conference on Cluster Computing
Computer Architecture, Fourth Edition: A Quantitative Approach

Computer Architecture, Fourth Edition: A Quantitative Approach
Fixed and Adaptive Sequential Prefetching in Shared Memory Multiprocessors

ICPP '93 Proceedings of the 1993 International Conference on Parallel Processing - Volume 01

Timing local streams: improving timeliness in data prefetching

Proceedings of the 24th ACM International Conference on Supercomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Data access delay is a major bottleneck in utilizing current high-end computing (HEC) machines. Prefetching, where data is fetched before CPU demands for it, has been considered as an effective solution to masking data access delay. However, current client-initiated prefetching strategies, where a computing processor initiates prefetching instructions, have many limitations. They do not work well for applications with complex, non-contiguous data access patterns. While technology advances continue to increase the gap between computing and data access performance, trading computing power for reducing data access delay has become a natural choice. In this paper, we present a server-based data-push approach and discuss its associated implementation mechanisms. In the server-push architecture, a dedicated server called Data Push Server (DPS) initiates and proactively pushes data closer to the client in time. Issues, such as what data to fetch, when to fetch, and how to push are studied. The SimpleScalar simulator is modified with a dedicated prefetching engine that pushes data for another processor to test DPS based prefetching. Simulation results show that L1 Cache miss rate can be reduced by up to 97% (71% on average) over a superscalar processor for SPEC CPU2000 benchmarks that have high cache miss rates.