Computer architecture: a quantitative approach
Computer architecture: a quantitative approach
A Vlsi Architecture for Separable 2-D Discrete Wavelet Transform
Journal of VLSI Signal Processing Systems - Special issue on future directions in the design and implementations of DSP systems
IEEE Transactions on Signal Processing
Image coding using wavelet transforms and entropy-constrained trellis-coded quantization
IEEE Transactions on Image Processing
Optimal memory organization for scalable texture codecs in MPEG-4
IEEE Transactions on Circuits and Systems for Video Technology
Implementation of a scalable MPEG-4 wavelet-based visual texture compression system
Proceedings of the 36th annual ACM/IEEE Design Automation Conference
A Scalable Architecture for MPEG-4 Wavelet Quantization
Journal of VLSI Signal Processing Systems - Special issue on implementation of MPEG-4 multimedia codecs
High-Level Cache Modeling for 2-D Discrete Wavelet Transform Implementations
Journal of VLSI Signal Processing Systems
Journal of VLSI Signal Processing Systems
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Optimized memory requirements for wavelet-based scalable multimedia codecs
Journal of Embedded Computing - Low-power Embedded Systems
Journal of Signal Processing Systems
FILESPPA: Fast Instruction Level Embedded System Power and Performance Analyzer
Microprocessors & Microsystems
PATMOS'07 Proceedings of the 17th international conference on Integrated Circuit and System Design: power and timing modeling, optimization and simulation
A Unified FPGA-Based System Architecture for 2-D Discrete Wavelet Transform
Journal of Signal Processing Systems
Hi-index | 0.01 |
The memory required for the implementation of the 2D wavelet transform typically incurs relatively high power consumption and limits the speed performances. In this paper we propose an optimized architecture of the 1D/2D wavelet transform, that reduces the memory size cost with one order of magnitude compared to classical implementation styles. This so-called Local Wavelet Transform also minimizes the memory access cost, thanks to its spatially localized processing. Furthermore, the proposed architecture introduces concurrency in the data transfer mechanism, resulting in speed performances that are not limited by data transfer delays to/from main (off-chip) memory. Finally, the production of parent-children trees in indivisible clusters, makes an easy interfacing to Zero-Tree encoder modules possible, while keeping Region-of-Interest functionalities. Practical implementations of the 1D and 2D Local Wavelet Transform with up to 9/7-tap wavelet filters and a large number of levels (e.g. 4, 5), can process 10 Msamples/s, with an internal processing clock of 40 MHz, in a very modest 0.7 $\mu$m CMOS process.