Accelerating sequential programs on Chip Multiprocessors via Dynamic Prefetching Thread

  • Authors:
  • Hou Rui;Longbing Zhang;Weiwu Hu

  • Affiliations:
  • Key Laboratory of Computer System and Architecture, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100080, China;Key Laboratory of Computer System and Architecture, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100080, China;Key Laboratory of Computer System and Architecture, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100080, China

  • Venue:
  • Microprocessors & Microsystems
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

A Dynamic Prefetching Thread scheme is proposed in this paper to accelerate sequential programs on Chip Multiprocessors. This scheme belongs to the hardware-generated thread-based prefetching technique and can decouple the performance and correctness to some extent. This paper describes the necessary hardware infrastructure supporting Dynamic Prefetching Thread on traditional Chip Multiprocessors. Aiming at the loosely coupled feature of Chip Multiprocessors, we present the ''Shadow Register'' mechanism to support rapid register transportation among multi-cores and discuss the selection of thread spawn time. Furthermore, two aggressive thread construction policies, known as ''Self-Loop'' and ''Fork-on-Recursive-Call'', are proposed. ''Self-Loop'' policy can greatly enlarge the prefetching range and issue more timely prefetches. ''Fork-on-Recursive-Call'' policy can effectively accelerate applications accessing trees or graphs via recursive calls. For a set of memory limited benchmarks selected from Olden benchmark, SPEC CPU2000 as well as Stream benchmark, an average speedup of 3.8% is achieved on dual-core CMP when constructing basic Dynamic Prefetching Threads, and this gain grows to 29.6% when adopting our aggressive thread construction policies.