Efficient data streaming with on-chip accelerators: Opportunities and challenges

  • Authors:
  • Rui Hou;Lixin Zhang;Michael C. Huang;Kun Wang;Hubertus Franke;Yi Ge;Xiaotao Chang

  • Affiliations:
  • IBM China Research Laboratory;National Research Center of High Performance Computers, Institute of Computing, Technology, Chinese Academy of Sciences;IBM T. J. Watson Research Center;IBM China Research Laboratory;IBM T. J. Watson Research Center;IBM China Research Laboratory;IBM China Research Laboratory

  • Venue:
  • HPCA '11 Proceedings of the 2011 IEEE 17th International Symposium on High Performance Computer Architecture
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

The transistor density of microprocessors continues to increase as technology scales. Microprocessors designers have taken advantage of the increased transistors by integrating a significant number of cores onto a single die. However, a large number of cores are met with diminishing returns due to software and hardware scalability issues and hence designers have started integrating on-chip special-purpose logic units (i.e., accelerators) that were previously available as PCI-attached units. It is anticipated that more accelerators will be integrated on-chip due to the increasing abundance of transistors and the fact that not all logic can be powered at all times due to power budget limits. Thus, on-chip accelerator architectures deserve more attention from the research community. There is a wide spectrum of research opportunities for design and optimization of accelerators. This paper attempts to bring out some insights by studying the data access streams of on-chip accelerators that hopefully foster some future research in this area. Specifically, this paper uses a few simple case studies to show some of the common characteristics of the data streams introduced by on-chip accelerators, discusses challenges and opportunities in exploiting these characteristics to optimize the power and performance of accelerators, and then analyzes the effectiveness of some simple optimizing extensions proposed.