A bandwidth-efficient architecture for media processing
MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Communications of the ACM - Special issue on computer architecture
StreamIt: A Language for Streaming Applications
CC '02 Proceedings of the 11th International Conference on Compiler Construction
Exploring the VLSI Scalability of Stream Processors
HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
A Media-Enhanced Vector Architecture for Embedded Memory Systems
A Media-Enhanced Vector Architecture for Embedded Memory Systems
Evaluating the Imagine Stream Architecture
Proceedings of the 31st annual international symposium on Computer architecture
Fast Paths in Concurrent Programs
Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
High-Throughput CORDIC-Based Geometry Operations for 3D Computer Graphics
IEEE Transactions on Computers
Analysis and Performance Results of a Molecular Modeling Application on Merrimac
Proceedings of the 2004 ACM/IEEE conference on Supercomputing
Merrimac: Supercomputing with Streams
Proceedings of the 2003 ACM/IEEE conference on Supercomputing
Methods for evaluating and covering the design space during early design development
Integration, the VLSI Journal
Teleport messaging for distributed stream programs
Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming
Optimizing stream programs using linear state space analysis
Proceedings of the 2005 international conference on Compilers, architectures and synthesis for embedded systems
Fault Tolerance Techniques for the Merrimac Streaming Supercomputer
SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
Energy efficiency vs. programmability trade-off: architectures and design principles
Proceedings of the conference on Design, automation and test in Europe: Proceedings
Vector LLVA: a virtual vector instruction set for media processing
Proceedings of the 2nd international conference on Virtual execution environments
Exploiting coarse-grained task, data, and pipeline parallelism in stream programs
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Streaming architectures and technology trends
SIGGRAPH '05 ACM SIGGRAPH 2005 Courses
Expressing and exploiting concurrency in networked applications with aspen
Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming
Compilation for explicitly managed memory hierarchies
Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming
A 64-bit stream processor architecture for scientific applications
Proceedings of the 34th annual international symposium on Computer architecture
ParallAX: an architecture for real-time physics
Proceedings of the 34th annual international symposium on Computer architecture
EXOCHI: architecture and programming environment for a heterogeneous multi-core multithreaded system
Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation
Microprocessors in the era of terascale integration
Proceedings of the conference on Design, automation and test in Europe
Very wide register: an asymmetric register file organization for low power embedded processors
Proceedings of the conference on Design, automation and test in Europe
Executing irregular scientific applications on stream architectures
Proceedings of the 21st annual international conference on Supercomputing
Tradeoff between data-, instruction-, and thread-level parallelism in stream processors
Proceedings of the 21st annual international conference on Supercomputing
Chip multi-processor generator
Proceedings of the 44th annual Design Automation Conference
HybridOS: runtime support for reconfigurable accelerators
Proceedings of the 16th international ACM/SIGDA symposium on Field programmable gate arrays
Merge: a programming model for heterogeneous multi-core systems
Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Transform coding on programmable stream processors
The Journal of Supercomputing
A lightweight streaming layer for multicore execution
ACM SIGARCH Computer Architecture News
ICCS '07 Proceedings of the 7th international conference on Computational Science, Part I: ICCS 2007
Exploiting loop-dependent stream reuse for stream processors
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
A tuning framework for software-managed memory hierarchies
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Certified Reasoning in Memory Hierarchies
APLAS '08 Proceedings of the 6th Asian Symposium on Programming Languages and Systems
Matrix-based streamization approach for improving locality and parallelism on FT64 stream processor
The Journal of Supercomputing
Analytic modeling of network processors for parallel workload mapping
ACM Transactions on Embedded Computing Systems (TECS)
Copy or Discard execution model for speculative parallelization on multicores
Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Evolution in architectures and programming methodologies of coarse-grained reconfigurable computing
Microprocessors & Microsystems
SLIPstream: scalable low-latency interactive perception on streaming data
Proceedings of the 18th international workshop on Network and operating systems support for digital audio and video
Rigel: an architecture and scalable programming interface for a 1000-core accelerator
Proceedings of the 36th annual international symposium on Computer architecture
Real-time Visual Tracker by Stream Processing
Journal of Signal Processing Systems
Flexible filters: load balancing through backpressure for stream programs
EMSOFT '09 Proceedings of the seventh ACM international conference on Embedded software
SpringSim '09 Proceedings of the 2009 Spring Simulation Multiconference
Efficient content analysis engine for visual surveillance network
IEEE Transactions on Circuits and Systems for Video Technology
PDCS '07 Proceedings of the 19th IASTED International Conference on Parallel and Distributed Computing and Systems
Mapping scalable video coding decoder on multi-core stream processors
PCS'09 Proceedings of the 27th conference on Picture Coding Symposium
ICME'09 Proceedings of the 2009 IEEE international conference on Multimedia and Expo
A Multi-Shared Register File Structure for VLIW Processors
Journal of Signal Processing Systems
Speculative parallelization of sequential loops on multicores
International Journal of Parallel Programming
A Force-Directed Scheduling based architecture generation algorithm and design tool for FPGAs
Journal of Systems Architecture: the EUROMICRO Journal
ICA3PP'07 Proceedings of the 7th international conference on Algorithms and architectures for parallel processing
Stream image processing on a dual-core embedded system
SAMOS'07 Proceedings of the 7th international conference on Embedded computer systems: architectures, modeling, and simulation
Implementation and evaluation of Jacobi iteration on the imagine stream processor
HiPC'07 Proceedings of the 14th international conference on High performance computing
Implementation and optimization of dense LU ecomposition on the stream processor
PPAM'07 Proceedings of the 7th international conference on Parallel processing and applied mathematics
Implementing and optimizing a data-intensive hydrodynamics application on the stream processor
ICCSA'07 Proceedings of the 2007 international conference on Computational science and its applications - Volume Part III
SAMOS'09 Proceedings of the 9th international conference on Systems, architectures, modeling and simulation
Exploiting the reuse supplied by loop-dependent stream references for stream processors
ACM Transactions on Architecture and Code Optimization (TACO)
An MPI-Stream Hybrid Programming Model for Computational Clusters
CCGRID '10 Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing
A programmable parallel accelerator for learning and classification
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
A parallel histogram-based particle filter for object tracking on SIMD-based smart cameras
Computer Vision and Image Understanding
Robotics and Computer-Integrated Manufacturing
GFT: GPU fast triangulation of 3D points
ICCVG'10 Proceedings of the 2010 international conference on Computer vision and graphics: Part II
MPEG-2 decoding in a stream programming language
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
An analysis of queuing network simulation using GPU-based hardware acceleration
ACM Transactions on Modeling and Computer Simulation (TOMACS)
MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Reconfigurable Morphological Image Processing Accelerator for Video Object Segmentation
Journal of Signal Processing Systems
Enhanced speculative parallelization via incremental recovery
Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
Parallel programming models for a multiprocessor SoC platform applied to networking and multimedia
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
An FPGA-based accelerator for LambdaRank in Web search engines
ACM Transactions on Reconfigurable Technology and Systems (TRETS)
Scientific computing applications on the imagine stream processor
ACSAC'06 Proceedings of the 11th Asia-Pacific conference on Advances in Computer Systems Architecture
Expressing pipeline parallelism using TBB constructs: a case study on what works and what doesn't
Proceedings of the compilation of the co-located workshops on DSM'11, TMC'11, AGERE!'11, AOOPES'11, NEAT'11, & VMIL'11
A programming model for an embedded media processing architecture
SAMOS'05 Proceedings of the 5th international conference on Embedded Computer Systems: architectures, Modeling, and Simulation
Matrix-Based programming optimization for improving memory hierarchy performance on imagine
ISPA'06 Proceedings of the 4th international conference on Parallel and Distributed Processing and Applications
Polyhedral code generation in the real world
CC'06 Proceedings of the 15th international conference on Compiler Construction
Proceedings of the 9th conference on Computing Frontiers
Architecture-based optimization for mapping scientific applications to imagine
ISPA'07 Proceedings of the 5th international conference on Parallel and Distributed Processing and Applications
Implementation and optimization of sparse matrix-vector multiplication on imagine stream processor
ISPA'07 Proceedings of the 5th international conference on Parallel and Distributed Processing and Applications
A compiler infrastructure for embedded heterogeneous MPSoCs
Proceedings of the 2013 International Workshop on Programming Models and Applications for Multicores and Manycores
Reducing memory access latency with asymmetric DRAM bank organizations
Proceedings of the 40th Annual International Symposium on Computer Architecture
Flexible filters in stream programs
ACM Transactions on Embedded Computing Systems (TECS)
StreaMorph: a case for synthesizing energy-efficient adaptive programs using high-level abstractions
Proceedings of the Eleventh ACM International Conference on Embedded Software
A compiler infrastructure for embedded heterogeneous MPSoCs
Parallel Computing
Hi-index | 4.10 |
The demand for flexibility in media processing motivates the use of programmable processors. However, very large-scale integration constraints limit the performance of traditional programmable architectures. In modern VLSI technology, computation is relatively cheap thousands of arithmetic logic units operating at multigigahertz rates can fit on a modestly sized 1 square centimeter die. Yet delivering instructions and data to those ALUs is prohibitively expensive.The Imagine media processor validates the hypothesis that careful management of bandwidth and parallelism, from the programming language to the hardware, results in both high performance and high performance per unit of power.