StreamTMC: Stream compilation for tiled multi-core architectures

  • Authors:
  • Haitao Wei;Mingkang Qin;Weiwei Zhang;Junqing Yu;Dongrui Fan;Guang R. Gao

  • Affiliations:
  • School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan 430074, China;School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan 430074, China;School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan 430074, China;School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan 430074, China and Center of Network and Computation, Huazhong University of Science and Technology, ...;Key Laboratory of Computer Systems and Architecture, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China;Department of Electrical and Computer Engineering, University of Delaware, Newark, DE 19716, United States

  • Venue:
  • Journal of Parallel and Distributed Computing
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Tiled multi-core architectures have become an important kind of multi-core design for its good scalability and low power consumption. Stream programming has been productively applied to a number of important application domains. It provides an attractive way to exploit the parallelism. However, the architecture characteristics of large amounts of cores, memory hierarchy and exposed communication between tiles have presented a performance challenge for stream programs running on tiled multi-cores. In this paper, we present StreamTMC, an efficient stream compilation framework that optimizes the execution of stream applications for the tiled multi-core. This framework is composed of three optimization phases. First, a software pipelining schedule is constructed to exploit the parallelism. Second, an efficient hybrid of SPM and cache buffer allocation algorithm and data copy elimination mechanism is proposed to improve the efficiency of the data access. Last, a communication aware mapping is proposed to reduce the network communication and synchronization overhead. We implement the StreamTMC compiler on Godson-T, a 64-core tiled architecture and conduct an experimental study to verify the effectiveness. The experimental results indicate that StreamTMC can achieve an average of 58% improvement over the performance before optimization.