Proceedings of the 30th annual international symposium on Computer architecture
The Reconfigurable Streaming Vector Processor (RSVPTM)
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Cluster prefetch: tolerating on-chip wire delays in clustered microarchitectures
Proceedings of the 18th annual international conference on Supercomputing
Evaluation of the Raw Microprocessor: An Exposed-Wire-Delay Architecture for ILP and Streams
Proceedings of the 31st annual international symposium on Computer architecture
Brook for GPUs: stream computing on graphics hardware
ACM SIGGRAPH 2004 Papers
Bandwidth Management with a Reconfigurable Data Cache
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 3 - Volume 04
Extracting Speedup From C-Code With Poor Instruction-Level Parallelism
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 14 - Volume 15
RPU: a programmable ray processing unit for realtime ray tracing
ACM SIGGRAPH 2005 Papers
Stream Programming on General-Purpose Processors
Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Shader Performance Analysis on a Modern GPU Architecture
Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
ClawHMMER: A Streaming HMMer-Search Implementatio
SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
A defect tolerant self-organizing nanoscale SIMD architecture
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Sequoia: programming the memory hierarchy
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Sequoia: programming the memory hierarchy
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
A wire delay-tolerant reconfigurable unit for a clustered programmable-reconfigurable processor
Microprocessors & Microsystems
A 64-bit stream processor architecture for scientific applications
Proceedings of the 34th annual international symposium on Computer architecture
Inter-cluster communication in VLIW architectures
ACM Transactions on Architecture and Code Optimization (TACO)
A self-organizing defect tolerant SIMD architecture
ACM Journal on Emerging Technologies in Computing Systems (JETC)
Dryad: distributed data-parallel programs from sequential building blocks
Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
EURASIP Journal on Applied Signal Processing
Explicit data organization SIMD instruction set architecture for media processors
PDCN'07 Proceedings of the 25th conference on Proceedings of the 25th IASTED International Multi-Conference: parallel and distributed computing and networks
Transform coding on programmable stream processors
The Journal of Supercomputing
Exploiting loop-dependent stream reuse for stream processors
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
GRAMPS: A programming model for graphics pipelines
ACM Transactions on Graphics (TOG)
Streaming implementation of a sequential decompression algorithm on an FPGA
Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays
Compiler-directed scratchpad memory management via graph coloring
ACM Transactions on Architecture and Code Optimization (TACO)
SRF coloring: stream register file allocation via graph coloring
Journal of Computer Science and Technology
Real-time Visual Tracker by Stream Processing
Journal of Signal Processing Systems
High Performance Matrix Multiplication on Many Cores
Euro-Par '09 Proceedings of the 15th International Euro-Par Conference on Parallel Processing
PDCS '07 Proceedings of the 19th IASTED International Conference on Parallel and Distributed Computing and Systems
A Skeletal Parallel Framework with Fusion Optimizer for GPGPU Programming
APLAS '09 Proceedings of the 7th Asian Symposium on Programming Languages and Systems
SP@CE: an SP-based programming model for consumer electronics streaming applications
LCPC'06 Proceedings of the 19th international conference on Languages and compilers for parallel computing
ICA3PP'07 Proceedings of the 7th international conference on Algorithms and architectures for parallel processing
Implementation and evaluation of Jacobi iteration on the imagine stream processor
HiPC'07 Proceedings of the 14th international conference on High performance computing
Implementation and optimization of dense LU ecomposition on the stream processor
PPAM'07 Proceedings of the 7th international conference on Parallel processing and applied mathematics
Implementing and optimizing a data-intensive hydrodynamics application on the stream processor
ICCSA'07 Proceedings of the 2007 international conference on Computational science and its applications - Volume Part III
Exploiting the reuse supplied by loop-dependent stream references for stream processors
ACM Transactions on Architecture and Code Optimization (TACO)
Understanding throughput-oriented architectures
Communications of the ACM
CODES/ISSS '10 Proceedings of the eighth IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Scientific computing applications on the imagine stream processor
ACSAC'06 Proceedings of the 11th Asia-Pacific conference on Advances in Computer Systems Architecture
A streaming implementation of transform and quantization in h.264
HPCC'06 Proceedings of the Second international conference on High Performance Computing and Communications
ACSAC'05 Proceedings of the 10th Asia-Pacific conference on Advances in Computer Systems Architecture
Parallelizing SOR for GPGPUs using alternate loop tiling
Parallel Computing
Compiler-assisted energy optimization for clustered VLIW processors
Journal of Parallel and Distributed Computing
StreamPI: a stream-parallel programming extension for object-oriented programming languages
The Journal of Supercomputing
Laplace transformation on the FT64 stream processor
ACSAC'07 Proceedings of the 12th Asia-Pacific conference on Advances in Computer Systems Architecture
Architecture-based optimization for mapping scientific applications to imagine
ISPA'07 Proceedings of the 5th international conference on Parallel and Distributed Processing and Applications
Implementation and optimization of sparse matrix-vector multiplication on imagine stream processor
ISPA'07 Proceedings of the 5th international conference on Parallel and Distributed Processing and Applications
Sigma*: symbolic learning of input-output specifications
POPL '13 Proceedings of the 40th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages
International Journal of Reconfigurable Computing - Special issue on Selected Papers from the 2011 International Conference on Reconfigurable Computing and FPGAs (ReConFig 2011)
Clustering scheduling for hardware tasks in reconfigurable computing systems
Journal of Systems Architecture: the EUROMICRO Journal
Hi-index | 0.02 |
The Imagine Stream Processor is a single-chip programmable media processor with 48 parallel ALUs. At 400 MHz, this translates to a peak arithmetic rate of 16 GFLOPS on single-precision data and 32 GOPS on 16-bit fixed-point data. The scalability of Imagine's programming model and architecture enable it to achieve such high arithmetic rates. Imagine executes applications that have been mapped to the stream programming model. The stream model decomposes applications into a set of computation kernels that operate on data streams. This mapping exposes the inherent locality and parallelism in the application, and Imagine exploits the locality and parallelism to provide a scalable architecture that supports 48 ALUs on a single chip. This paper presents the Imagine architecture and programming model in the first half, and explores the scalability of the Imagine architecture in the second half.