Efficient data streaming with on-chip accelerators: Opportunities and challenges

Authors:
Rui Hou;Lixin Zhang;Michael C. Huang;Kun Wang;Hubertus Franke;Yi Ge;Xiaotao Chang
Affiliations:
IBM China Research Laboratory;National Research Center of High Performance Computers, Institute of Computing, Technology, Chinese Academy of Sciences;IBM T. J. Watson Research Center;IBM China Research Laboratory;IBM T. J. Watson Research Center;IBM China Research Laboratory;IBM China Research Laboratory
Venue:
HPCA '11 Proceedings of the 2011 IEEE 17th International Symposium on High Performance Computer Architecture
Year:
2011

Citing 0
Cited 4

A case for globally shared-medium on-chip interconnect

Proceedings of the 38th annual international symposium on Computer architecture
Hardware acceleration in the IBM PowerEN processor: architecture and performance

Proceedings of the 21st international conference on Parallel architectures and compilation techniques
Application-driven energy-efficient architecture explorations for big data

Proceedings of the 1st Workshop on Architectures and Systems for Big Data
Configurable range memory for effective data reuse on programmable accelerators

ACM Transactions on Design Automation of Electronic Systems (TODAES)

Quantified Score

Hi-index	0.00

Visualization

Abstract

The transistor density of microprocessors continues to increase as technology scales. Microprocessors designers have taken advantage of the increased transistors by integrating a significant number of cores onto a single die. However, a large number of cores are met with diminishing returns due to software and hardware scalability issues and hence designers have started integrating on-chip special-purpose logic units (i.e., accelerators) that were previously available as PCI-attached units. It is anticipated that more accelerators will be integrated on-chip due to the increasing abundance of transistors and the fact that not all logic can be powered at all times due to power budget limits. Thus, on-chip accelerator architectures deserve more attention from the research community. There is a wide spectrum of research opportunities for design and optimization of accelerators. This paper attempts to bring out some insights by studying the data access streams of on-chip accelerators that hopefully foster some future research in this area. Specifically, this paper uses a few simple case studies to show some of the common characteristics of the data streams introduced by on-chip accelerators, discusses challenges and opportunities in exploiting these characteristics to optimize the power and performance of accelerators, and then analyzes the effectiveness of some simple optimizing extensions proposed.