Discrete cosine transform: algorithms, advantages, applications
Discrete cosine transform: algorithms, advantages, applications
The SPARC architecture manual: version 8
The SPARC architecture manual: version 8
Instruction set selection for ASIP design
CODES '99 Proceedings of the seventh international workshop on Hardware/software codesign
IEEE Micro
Garp: a MIPS processor with a reconfigurable coprocessor
FCCM '97 Proceedings of the 5th IEEE Symposium on FPGA-Based Custom Computing Machines
PACT XPP—A Self-Reconfigurable Data Processing Architecture
The Journal of Supercomputing
Vector microprocessors
A novel four-step search algorithm for fast block motion estimation
IEEE Transactions on Circuits and Systems for Video Technology
A novel unrestricted center-biased diamond search algorithm for block motion estimation
IEEE Transactions on Circuits and Systems for Video Technology
Hi-index | 0.00 |
This work presents a detailed case study in customizing a configurable, extensible, 32-bit RISC processor with vector/SIMD instruction extensions for the efficient execution of block-based video-coding algorithms utilizing a proprietary co-design environment. In addition to the default Full-Search motion estimation of the MPEG-2 Test Model 5, fourteen fast ME algorithms were implemented in both scalar and vector form. Results demonstrate a reduction of up to 68% in the dynamic instruction count of the full search-based encoder whereas the fast motion estimation algorithms achieved a reduction in instruction count of nearly 90%, both accelerated via three 128-bit vector/SIMD instructions when compared to the scalar, reference implementation of the standard. We address in detail the profiling, vectorization and the development of these vector instruction set extensions, discuss in depth the implementation of a parametric vector accelerator that implements these instructions and show the introduction of that accelerator into a 32-bit RISC processor pipeline, in a closely-coupled configuration.