Architectural Support for the Stream Execution Model on General-Purpose Processors

Authors:
Jayanth Gummaraju;Mattan Erez;Joel Coburn;Mendel Rosenblum;William J. Dally
Affiliations:
Stanford University, USA;University of Texas at Austin, USA;Stanford University, USA;Stanford University, USA;Stanford University, USA
Venue:
PACT '07 Proceedings of the 16th International Conference on Parallel Architecture and Compilation Techniques
Year:
2007

Citing 0
Cited 9

Streamware: programming general-purpose multicore processors using streams

Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Atomic Vector Operations on Chip Multiprocessors

ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Comparative evaluation of memory models for chip multiprocessors

ACM Transactions on Architecture and Code Optimization (TACO)
An analytical model to exploit memory task scheduling

Proceedings of the 2010 Workshop on Interaction between Compilers and Computer Architecture
On-chip communication and synchronization mechanisms with cache-integrated network interfaces

Proceedings of the 7th ACM international conference on Computing frontiers
Memory Latency Reduction via Thread Throttling

MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Cost-effectively offering private buffers in SoCs and CMPs

Proceedings of the international conference on Supercomputing
Mapping streaming languages to general purpose processors through vectorization

LCPC'09 Proceedings of the 22nd international conference on Languages and Compilers for Parallel Computing
Prefetching and cache management using task lifetimes

Proceedings of the 27th international ACM conference on International conference on supercomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

There has recently been much interest in stream processing, both in industry (e.g., Cell, NVIDIA G80, ATI R580) and academia (e.g., Stanford Merrimac, MIT RAW), with stream programs becoming increasingly popular for both media and more general-purpose computing. Although a special style of programming called stream programming is needed to target these stream architectures, huge performance benefits can be achieved. In this paper, we minimally add architectural features to commodity general-purpose processors (e.g., Intel/AMD) to efficiently support the stream execution model. We design the extensions to reuse existing components of the general-purpose processor hardware as much as possible by investigating low-cost modifications to the CPU caches, hardware prefetcher, and the execution core. With a less than 1% increase in die area along with judicious use of a software runtime system, we can efficiently support stream programming on traditional processor cores. We evaluate our techniques by running scientific applications on a cycle-level simulation system. The results show that our system executes stream programs as efficiently as possible, limited only by the ALU performance and the memory bandwidth needed to feed the ALUs.