Improving the memory bandwidth utilization using loop transformations

Authors:
Minas Dasygenis;Erik Brockmeyer;Francky Catthoor;Dimitrios Soudris;Antonios Thanailakis
Affiliations:
VLSI Design and Testing Center, Department of Electrical and Computer Engineering, Democritus University of Thrace, Xanthi, Greece;DESICS, IMEC, Leuven, Belgium;DESICS, IMEC, Leuven, Belgium;VLSI Design and Testing Center, Department of Electrical and Computer Engineering, Democritus University of Thrace, Xanthi, Greece;VLSI Design and Testing Center, Department of Electrical and Computer Engineering, Democritus University of Thrace, Xanthi, Greece
Venue:
PATMOS'05 Proceedings of the 15th international conference on Integrated Circuit and System Design: power and Timing Modeling, Optimization and Simulation
Year:
2005

Citing 9
Cited 1

Efficient fair queueing using deficit round robin

SIGCOMM '95 Proceedings of the conference on Applications, technologies, architectures, and protocols for computer communication
Optimal two level partitioning and loop scheduling for hiding memory latency for DSP applications

Proceedings of the 37th Annual Design Automation Conference
Minimizing the required memory bandwidth in VLSI system realizations

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Data and memory optimization techniques for embedded systems

ACM Transactions on Design Automation of Electronic Systems (TODAES)
Compiler-directed scratch pad memory hierarchy design and management

Proceedings of the 39th annual Design Automation Conference
Custom Memory Management Methodology: Exploration of Memory Organisation for Embedded Multimedia System Design

Custom Memory Management Methodology: Exploration of Memory Organisation for Embedded Multimedia System Design
Optimal Software Pipelining Through Enumeration of Schedules

Euro-Par '96 Proceedings of the Second International Euro-Par Conference on Parallel Processing-Volume II
The Memory Bandwidth Bottleneck and its Amelioration by a Compiler

IPDPS '00 Proceedings of the 14th International Symposium on Parallel and Distributed Processing
Loop Alignment for Memory Accesses Optimization

Proceedings of the 12th international symposium on System synthesis

Data locality and parallelism optimization using a constraint-based approach

Journal of Parallel and Distributed Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Embedded devices designed for various real-time multimedia and telecom applications, have a bottleneck in energy consumption and performance that becomes day by day more crucial. This is imposed by the increasing gap between processor and memory speed. Many authors have addressed this problem, but all existing techniques either consider only performance without any other trade-off, or they operate at the level of individual loops. We fill this gap, by presenting a technique which achieves parallelization in the memory accesses through four loop transformations. Our estimations from two real-life applications from the multimedia and telecom domain, reveal that using our technique, we can either increase the performance (up to 35%) or lower the energy consumption (up to 20%) for the same cost.