Memory binding for performance optimization of control-flow intensive behaviors

Authors:
Kamal S. Khouri;Ganesh Lakshminarayana;Niraj K. Jha
Affiliations:
Department of Electrical Engineering, Princeton University, Princeton, NJ;C&C Research Labs, NEC, Inc., Princeton, NJ;Department of Electrical Engineering, Princeton University, Princeton, NJ
Venue:
ICCAD '99 Proceedings of the 1999 IEEE/ACM international conference on Computer-aided design
Year:
1999

Citing 12
Cited 4

Computer architecture: a quantitative approach

Computer architecture: a quantitative approach
Utilization of multiport memories in data path synthesis

DAC '93 Proceedings of the 30th international Design Automation Conference
Memory estimation for high level synthesis

DAC '94 Proceedings of the 31st annual Design Automation Conference
Performance analysis and optimization of schedules for conditional and loop-intensive specifications

DAC '94 Proceedings of the 31st annual Design Automation Conference
The SPLASH-2 programs: characterization and methodological considerations

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Synthesis of application-specific memory designs

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Exact memory size estimation for array computations without loop unrolling

Proceedings of the 36th annual ACM/IEEE Design Automation Conference
Memory Issues in Embedded Systems-on-Chip: Optimizations and Exploration

Memory Issues in Embedded Systems-on-Chip: Optimizations and Exploration
Algorithmic and Register-Transfer Level Synthesis: The System Architect's Workbench

Algorithmic and Register-Transfer Level Synthesis: The System Architect's Workbench
The MIMOLA design system: Detailed description of the software system

DAC '79 Proceedings of the 16th Design Automation Conference
Flow Graph Balancing for Minimizing the Required Memory Bandwidth

ISSS '96 Proceedings of the 9th international symposium on System synthesis
Wavesched: a novel scheduling technique for control-flow intensive designs

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

Search space definition and exploration for nonuniform data reuse opportunities in data-dominant applications

ACM Transactions on Design Automation of Electronic Systems (TODAES)
High-level synthesis of distributed logic-memory architectures

Proceedings of the 2002 IEEE/ACM international conference on Computer-aided design
High-level synthesis using computation-unit integrated memories

Proceedings of the 2004 IEEE/ACM International conference on Computer-aided design
Compiling for reconfigurable computing: A survey

ACM Computing Surveys (CSUR)

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents a memory binding algorithm for behaviors that are characterized by the presence of conditionals and deeply-nested loops that access memory extensively through arrays. Unlike previous works, this algorithm examines the effects of branch probabilities and allocation constraints. First, we demonstrate, through examples, the importance of incorporating branch probabilities and allocation constraint information when searching for a performance-efficient memory binding. We also show the interdependence of these two factors and how varying one without considering the other may greatly affect the performance of the behavior. Second, we introduce a memory binding algorithm that has the ability to examine numerous bindings by employing an efficient performance estimation procedure. The estimation procedure exploits locality of execution, which is an inherent characteristic of target behaviors. This enables the performance estimation technique to look at the global impact of the different bindings, given the allocation constraints.We tested our algorithm using a number of benchmarks from the parallel computing domain. A series of experiments demonstrates the algorithm's ability to produce bindings that optimize performance, meet memory allocation constraints, and adapt to different resource constraints and branch probabilities. Results show that the algorithm requires 37% fewer memories with a performance loss of only 0.3% when compared to a parallel memory architecture. When compared to the best of a series of random memory bindings, the algorithm improves schedule performance by 21%.