Advanced compiler optimizations for supercomputers
Communications of the ACM - Special issue on parallelism
Memory storage patterns in parallel processing
Memory storage patterns in parallel processing
Performance evaluation of memory consistency models for shared-memory multiprocessors
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Eliminating Interlocks in Deeply Pipelined Processors by Delay Enforced Multistreaming
IEEE Transactions on Computers
IEEE Transactions on Computers
Characterizing the Memory Behavior of Compiler-Parallelized Applications
IEEE Transactions on Parallel and Distributed Systems
A Heuristic Storage for Minimizing Access Time of Arbitrary Data Patterns
IEEE Transactions on Parallel and Distributed Systems
A study on the number of memory ports in multiple instruction issue machines
MICRO 26 Proceedings of the 26th annual international symposium on Microarchitecture
Computer architecture (2nd ed.): a quantitative approach
Computer architecture (2nd ed.): a quantitative approach
System-Level Power Optimization of Video Codecs on Embedded Cores: A Systematic Approach
Journal of VLSI Signal Processing Systems - Special issue on future directions in the design and implementations of DSP systems
Journal of VLSI Signal Processing Systems - Special issue on systematic trade-off analysis in signal processing systems design
A compiler-directed cache coherence scheme with improved intertask locality
Proceedings of the 1994 ACM/IEEE conference on Supercomputing
Managing Locality Sets: The Model and Fixed-Size Buffers
IEEE Transactions on Computers
An Iteration Partition Approach for Cache or Local Memory Thrashing on Parallel Processing
IEEE Transactions on Computers
Analysis of Multiprocessor Memory Refernce Behavior
ICCS '94 Proceedings of the1994 IEEE International Conference on Computer Design: VLSI in Computer & Processors
Issues in Multi-Level Cache Designs
ICCS '94 Proceedings of the1994 IEEE International Conference on Computer Design: VLSI in Computer & Processors
A Singular Loop Transformation Framework Based on Non-Singular Matrices
Proceedings of the 5th International Workshop on Languages and Compilers for Parallel Computing
A Unified Transformation Technique for Multilevel Blocking
Euro-Par '96 Proceedings of the Second International Euro-Par Conference on Parallel Processing - Volume I
Implementing Flexible Computation Rules with Subexpression-level Loop Transformation
Euro-Par '95 Proceedings of the First International Euro-Par Conference on Parallel Processing
Array Placement for Storage Size Reduction in Embedded Multimedia Systems
ASAP '97 Proceedings of the IEEE International Conference on Application-Specific Systems, Architectures and Processors
Low-power data transfer and storage exploration for H.263 video decoder system
IEEE Journal on Selected Areas in Communications
The MPEG-4 video standard verification model
IEEE Transactions on Circuits and Systems for Video Technology
IEEE Transactions on Circuits and Systems for Video Technology
Hi-index | 0.00 |
Upcoming multi-media compression applications will require high memory bandwidth. In this paper, we estimate that a software reference implementation of an MPEG-4 video decoder typically requires 200 Mtransfers/s to memory to decode 1 CIF (352×288) Video Object Plane (VOP) at 30 frames/s. This imposes a high penalty in terms of power but also performance.However, we also show that we can heavily improve on the memory transfers, without sacrificing speed (even gaining about 10% on cache misses and cycles for a DEC Alpha), by aggressive code transformations. For this purpose, we have manually applied an extended version of our data transfer and storage exploration (DTSE) methodology, which was originally developed for custom hardware implementations.