CODES '00 Proceedings of the eighth international workshop on Hardware/software codesign
DATE '00 Proceedings of the conference on Design, automation and test in Europe
A preprocessing step for global loop transformations for data transfer optimization
CASES '00 Proceedings of the 2000 international conference on Compilers, architecture, and synthesis for embedded systems
Proceedings of the 38th annual Design Automation Conference
Automated data dependency size estimation with a partially fixed execution ordering
Proceedings of the 2000 IEEE/ACM international conference on Computer-aided design
A 2D Addressing Mode for Multimedia Applications
Embedded Processor Design Challenges: Systems, Architectures, Modeling, and Simulation - SAMOS
A 2D addressing mode for multimedia applications
Embedded processor design challenges
Storage requirement estimation for optimized design of data intensive applications
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Maximizing data reuse for minimizing memory space requirements and execution cycles
ASP-DAC '06 Proceedings of the 2006 Asia and South Pacific Design Automation Conference
SoC Memory Hierarchy Derivation from Dataflow Graphs
Journal of Signal Processing Systems
Reconfigurable microarchitecture based system-level dynamic power management soc platform
ICESS'05 Proceedings of the Second international conference on Embedded Software and Systems
Hi-index | 0.00 |
Data transfers and storage are crucial cost factors in multimedia systems. Systematic methodologies are needed to obtain dramatic reductions in terms of power, area and cycle count. Upcoming multimedia processing applications will require high memory bandwidth. In this paper, we estimate that a software reference implementation of an MPEG-4 video encoder typically requires five Gtransfers/s to main memory for a simple profile level L2. This shows a clear need for optimization and the use of intermediate memory stages. By applying our ACROPOLIS methodology, developed mainly to relieve this data access bottleneck, we have arrived at an implementation which needs a factor 65 less background accesses. In addition, we also show that we can heavily improve on the memory transfers, without sacrificing speed (even gaining about 10% on cache misses and cycles for a DEC Alpha), by aggressive source code transformations