Accelerating sequential programs on Chip Multiprocessors via Dynamic Prefetching Thread

Authors:
Hou Rui;Longbing Zhang;Weiwu Hu
Affiliations:
Key Laboratory of Computer System and Architecture, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100080, China;Key Laboratory of Computer System and Architecture, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100080, China;Key Laboratory of Computer System and Architecture, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100080, China
Venue:
Microprocessors & Microsystems
Year:
2007

Citing 28
Cited 0

Tolerating latency through software-controlled prefetching in shared-memory multiprocessors

Journal of Parallel and Distributed Computing - Special issue on shared-memory multiprocessors
Simultaneous multithreading: maximizing on-chip parallelism

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Multiscalar processors

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
An effective programmable prefetch engine for on-chip caches

Proceedings of the 28th annual international symposium on Microarchitecture
Compiler-based prefetching for recursive data structures

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Olden: parallelizing programs with dynamic data structures on distributed-memory machines

Olden: parallelizing programs with dynamic data structures on distributed-memory machines
Prefetching using Markov predictors

Proceedings of the 24th annual international symposium on Computer architecture
Data speculation support for a chip multiprocessor

Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Dependence based prefetching for linked data structures

Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Effective jump-pointer prefetching for linked data structures

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Slice-processors: an implementation of operation-based prediction

ICS '01 Proceedings of the 15th international conference on Supercomputing
Execution-based prediction using speculative slices

ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Speculative precomputation: long-range prefetching of delinquent loads

ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Tolerating memory latency through software-controlled pre-execution in simultaneous multithreading processors

ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Post-pass binary adaptation for software-based speculative precomputation

PLDI '02 Proceedings of the ACM SIGPLAN 2002 Conference on Programming language design and implementation
Dynamic speculative precomputation

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Maximizing Multiprocessor Performance with the SUIF Compiler

Computer
Speculative Multithreaded Processors

Computer
The MIPS R10000 Superscalar Microprocessor

IEEE Micro
Exploring the Design Space of Future CMPs

Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques
The Potential for Using Thread-Level Data Speculation to Facilitate Automatic Parallelization

HPCA '98 Proceedings of the 4th International Symposium on High-Performance Computer Architecture
Runahead Execution: An Alternative to Very Large Instruction Windows for Out-of-Order Processors

HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Speculative Data-Driven Multithreading

HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
Billion-Transistor Architectures: There and Back Again

Computer
Tasking with out-of-order spawn in TLS chip multiprocessors: microarchitecture and compilation

Proceedings of the 19th annual international conference on Supercomputing
Dual-Core Execution: Building a Highly Scalable Single-Thread Instruction Window

Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Dynamic Helper Threaded Prefetching on the Sun UltraSPARC CMP Processor

Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture

Quantified Score

Hi-index	0.00

Visualization

Abstract

A Dynamic Prefetching Thread scheme is proposed in this paper to accelerate sequential programs on Chip Multiprocessors. This scheme belongs to the hardware-generated thread-based prefetching technique and can decouple the performance and correctness to some extent. This paper describes the necessary hardware infrastructure supporting Dynamic Prefetching Thread on traditional Chip Multiprocessors. Aiming at the loosely coupled feature of Chip Multiprocessors, we present the ''Shadow Register'' mechanism to support rapid register transportation among multi-cores and discuss the selection of thread spawn time. Furthermore, two aggressive thread construction policies, known as ''Self-Loop'' and ''Fork-on-Recursive-Call'', are proposed. ''Self-Loop'' policy can greatly enlarge the prefetching range and issue more timely prefetches. ''Fork-on-Recursive-Call'' policy can effectively accelerate applications accessing trees or graphs via recursive calls. For a set of memory limited benchmarks selected from Olden benchmark, SPEC CPU2000 as well as Stream benchmark, an average speedup of 3.8% is achieved on dual-core CMP when constructing basic Dynamic Prefetching Threads, and this gain grows to 29.6% when adopting our aggressive thread construction policies.