Design of the IBM RISC System/6000 floating-point execution unit
IBM Journal of Research and Development
A Single-Chip Multiprocessor for Multimedia: the MVP
IEEE Computer Graphics and Applications
Alpha AXP architecture reference manual (2nd ed.)
Alpha AXP architecture reference manual (2nd ed.)
Introduction to Arithmetic for Digital Systems Designers
Introduction to Arithmetic for Digital Systems Designers
MPEG Video Compression Standard
MPEG Video Compression Standard
IEEE Transactions on Computers
An Architectural Overview of the Programmable Multimedia Processor, TM-1
COMPCON '96 Proceedings of the 41st IEEE International Computer Conference
Implementation of a streaming execution unit
Journal of Systems Architecture: the EUROMICRO Journal - Special issue: Synthesis and verification
The CSI multimedia architecture
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Strategies for dynamic memory allocation in hybrid architectures
Proceedings of the 6th ACM conference on Computing frontiers
Efficient memory management for hardware accelerated Java Virtual Machines
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Hi-index | 0.00 |
In this paper we show that some expressions frequently used in multimedia applications can be formulated as a general add-multiply-add operation. We further show a hardwired implementation of the Add-Multiply-Add instruction which is no more complex than the multiplier implementation. Furthermore we show that two frequently motion estimation operations, the Sum and Mean of Absolute Differences, can be implemented in hardware requiring also approximately the same cycle time as the multiplication. We also show that our approach can be extended easily to provide the computation of the Sum and Mean of Absolute Difference of a 16×16 pixel block in no more than four machine cycles. Additionally we propose a codec hardwired mechanism for the Paeth predictor used in the Portable Network Standard (PNG) that requires at most two general purpose ALU cycles. We extend the paeth unit to include the median, maximum and minimum operations on three inputs with no additional cycle time and we also extend the Add-Multiply-Add unit to include the mean of three numbers. Finally we propose a multimedia hardware accelerator to accommodate all the proposed operations. The proposed unit is an extension of the multiply pipeline with ALU extensions with no extra stages added. The unit operates on 32 instructions in total.