The Imagine Stream Processor

Authors:
Affiliations:
Venue:
ICCD '02 Proceedings of the 2002 IEEE International Conference on Computer Design: VLSI in Computers and Processors (ICCD'02)
Year:
2002

Citing 0
Cited 51

A performance analysis of PIM, stream processing, and tiled processing on memory-intensive signal processing kernels

Proceedings of the 30th annual international symposium on Computer architecture
The Reconfigurable Streaming Vector Processor (RSVPTM)

Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Cluster prefetch: tolerating on-chip wire delays in clustered microarchitectures

Proceedings of the 18th annual international conference on Supercomputing
Evaluation of the Raw Microprocessor: An Exposed-Wire-Delay Architecture for ILP and Streams

Proceedings of the 31st annual international symposium on Computer architecture
Brook for GPUs: stream computing on graphics hardware

ACM SIGGRAPH 2004 Papers
Bandwidth Management with a Reconfigurable Data Cache

IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 3 - Volume 04
Extracting Speedup From C-Code With Poor Instruction-Level Parallelism

IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 14 - Volume 15
RPU: a programmable ray processing unit for realtime ray tracing

ACM SIGGRAPH 2005 Papers
Stream Programming on General-Purpose Processors

Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Shader Performance Analysis on a Modern GPU Architecture

Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
ClawHMMER: A Streaming HMMer-Search Implementatio

SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
A defect tolerant self-organizing nanoscale SIMD architecture

Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Sequoia: programming the memory hierarchy

Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Sequoia: programming the memory hierarchy

Proceedings of the 2006 ACM/IEEE conference on Supercomputing
A wire delay-tolerant reconfigurable unit for a clustered programmable-reconfigurable processor

Microprocessors & Microsystems
A 64-bit stream processor architecture for scientific applications

Proceedings of the 34th annual international symposium on Computer architecture
Inter-cluster communication in VLIW architectures

ACM Transactions on Architecture and Code Optimization (TACO)
A self-organizing defect tolerant SIMD architecture

ACM Journal on Emerging Technologies in Computing Systems (JETC)
Dryad: distributed data-parallel programs from sequential building blocks

Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
Rapid VLIW processor customization for signal processing applications using combinational hardware functions

EURASIP Journal on Applied Signal Processing
Explicit data organization SIMD instruction set architecture for media processors

PDCN'07 Proceedings of the 25th conference on Proceedings of the 25th IASTED International Multi-Conference: parallel and distributed computing and networks
Transform coding on programmable stream processors

The Journal of Supercomputing
Exploiting loop-dependent stream reuse for stream processors

Proceedings of the 17th international conference on Parallel architectures and compilation techniques
GRAMPS: A programming model for graphics pipelines

ACM Transactions on Graphics (TOG)
Streaming implementation of a sequential decompression algorithm on an FPGA

Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays
Compiler-directed scratchpad memory management via graph coloring

ACM Transactions on Architecture and Code Optimization (TACO)
SRF coloring: stream register file allocation via graph coloring

Journal of Computer Science and Technology
Real-time Visual Tracker by Stream Processing

Journal of Signal Processing Systems
High Performance Matrix Multiplication on Many Cores

Euro-Par '09 Proceedings of the 15th International Euro-Par Conference on Parallel Processing
Design and implementation of stream processing system and library for CELL broadband engine processors

PDCS '07 Proceedings of the 19th IASTED International Conference on Parallel and Distributed Computing and Systems
A Skeletal Parallel Framework with Fusion Optimizer for GPGPU Programming

APLAS '09 Proceedings of the 7th Asian Symposium on Programming Languages and Systems
SP@CE: an SP-based programming model for consumer electronics streaming applications

LCPC'06 Proceedings of the 19th international conference on Languages and compilers for parallel computing
Optimizing stream organization to improve the performance of scientific computing applications on the stream processor

ICA3PP'07 Proceedings of the 7th international conference on Algorithms and architectures for parallel processing
Implementation and evaluation of Jacobi iteration on the imagine stream processor

HiPC'07 Proceedings of the 14th international conference on High performance computing
Implementation and optimization of dense LU ecomposition on the stream processor

PPAM'07 Proceedings of the 7th international conference on Parallel processing and applied mathematics
Implementing and optimizing a data-intensive hydrodynamics application on the stream processor

ICCSA'07 Proceedings of the 2007 international conference on Computational science and its applications - Volume Part III
Exploiting the reuse supplied by loop-dependent stream references for stream processors

ACM Transactions on Architecture and Code Optimization (TACO)
Understanding throughput-oriented architectures

Communications of the ACM
Optimal synthesis of latency and throughput constrained pipelined MPSoCs targeting streaming applications

CODES/ISSS '10 Proceedings of the eighth IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Scientific computing applications on the imagine stream processor

ACSAC'06 Proceedings of the 11th Asia-Pacific conference on Advances in Computer Systems Architecture
A streaming implementation of transform and quantization in h.264

HPCC'06 Proceedings of the Second international conference on High Performance Computing and Communications
Software-Oriented system-level simulation for design space exploration of reconfigurable architectures

ACSAC'05 Proceedings of the 10th Asia-Pacific conference on Advances in Computer Systems Architecture
Parallelizing SOR for GPGPUs using alternate loop tiling

Parallel Computing
Compiler-assisted energy optimization for clustered VLIW processors

Journal of Parallel and Distributed Computing
StreamPI: a stream-parallel programming extension for object-oriented programming languages

The Journal of Supercomputing
Laplace transformation on the FT64 stream processor

ACSAC'07 Proceedings of the 12th Asia-Pacific conference on Advances in Computer Systems Architecture
Architecture-based optimization for mapping scientific applications to imagine

ISPA'07 Proceedings of the 5th international conference on Parallel and Distributed Processing and Applications
Implementation and optimization of sparse matrix-vector multiplication on imagine stream processor

ISPA'07 Proceedings of the 5th international conference on Parallel and Distributed Processing and Applications
Sigma*: symbolic learning of input-output specifications

POPL '13 Proceedings of the 40th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages
High-level design space and flexibility exploration for adaptive, energy-efficient WCDMA channel estimation architectures

International Journal of Reconfigurable Computing - Special issue on Selected Papers from the 2011 International Conference on Reconfigurable Computing and FPGAs (ReConFig 2011)
Clustering scheduling for hardware tasks in reconfigurable computing systems

Journal of Systems Architecture: the EUROMICRO Journal

Quantified Score

Hi-index	0.02

Visualization

Abstract

The Imagine Stream Processor is a single-chip programmable media processor with 48 parallel ALUs. At 400 MHz, this translates to a peak arithmetic rate of 16 GFLOPS on single-precision data and 32 GOPS on 16-bit fixed-point data. The scalability of Imagine's programming model and architecture enable it to achieve such high arithmetic rates. Imagine executes applications that have been mapped to the stream programming model. The stream model decomposes applications into a set of computation kernels that operate on data streams. This mapping exposes the inherent locality and parallelism in the application, and Imagine exploits the locality and parallelism to provide a scalable architecture that supports 48 ALUs on a single chip. This paper presents the Imagine architecture and programming model in the first half, and explores the scalability of the Imagine architecture in the second half.