Available paralellism in video applications
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Trace-driven studies of VLIW video signal processors
Proceedings of the tenth annual ACM symposium on Parallel algorithms and architectures
Data-path synthesis of VLIW video signal processors
Proceedings of the 11th international symposium on System synthesis
Dynamic Parallel media processing using Speculative Broadcast Loop (SBL)
IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
A Low Energy Clustered Instruction Memory Hierarchy for Long Instruction Word Processors
PATMOS '02 Proceedings of the 12th International Workshop on Integrated Circuit Design. Power and Timing Modeling, Optimization and Simulation
Efficient orchestration of sub-word parallelism in media processors
Proceedings of the sixteenth annual ACM symposium on Parallelism in algorithms and architectures
Tile size selection for low-power tile-based architectures
Proceedings of the 3rd conference on Computing frontiers
Synchroscalar: Evaluation of an embedded, multi-core architecture for media applications
Journal of Embedded Computing - Issues in embedded single-chip multicore architectures
Transactions on High-Performance Embedded Architectures and Compilers I
Hi-index | 0.00 |
This paper represents a design study of the datapath for a very long instruction word (VLIW) video signal processor (VSP). VLIW architectures provide high parallelism and excellent high-level language programmability, but require careful attention to VLSI and compiler design. Flexible, high-bandwidth interconnect, high-connectivity register files, and fast cycle times are required to achieve real-time video signal processing. Parameterizable versions of key modules have been designed in a 0.25 /spl mu/m process, allowing us to explore tradeoffs in the VLIW VSP design space. The designs target 33 operations per cycle at clock rates exceeding 600 MHz. Various VLIW code scheduling techniques have been applied to 6 VSP kernels and evaluated on 7 different candidate datapath designs. The results of these simulations are used to indicate which architectural tradeoffs enhance overall performance in this application domain.