Simultaneous multithreading: maximizing on-chip parallelism
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
JPEG 2000: Image Compression Fundamentals, Standards and Practice
JPEG 2000: Image Compression Fundamentals, Standards and Practice
A survey of processors with explicit multithreading
ACM Computing Surveys (CSUR)
HiPC '02 Proceedings of the 9th International Conference on High Performance Computing
Vectorization of the 2D Wavelet Lifting Transform Using SIMD Extensions
IPDPS '03 Proceedings of the 17th International Symposium on Parallel and Distributed Processing
Linux Journal
Optimal memory organization for scalable texture codecs in MPEG-4
IEEE Transactions on Circuits and Systems for Video Technology
Hi-index | 0.00 |
Simultaneous multithreading (SMT) is being incorporated into modern superscalar microprocessors, allowing several independent threads to issue instructions to the functional units in a single cycle. Effective use of the SMT can hide the inefficiencies caused by long operation latencies, thereby yielding a better utilization of the processor's resources. In this paper we explore techniques to efficiently exploit this capability and its interaction with short-vector processing. We put special emphasis on the differences in algorithm tuning between SMT architectures and shared memory symmetric multiprocessors. As a case study we have chosen the well known Discrete Wavelet Transform (DWT), a central-piece in some image and video coding standards such as MPEG-4 or JPEG-2000.