High-level synthesis of distributed logic-memory architectures

Authors:
Chao Huang;Srivaths Ravi;Anand Raghunathan;Niraj K. Jha
Affiliations:
Princeton University, Princeton, NJ;NEC USA, Princeton, NJ;NEC USA, Princeton, NJ;Princeton University, Princeton, NJ
Venue:
Proceedings of the 2002 IEEE/ACM international conference on Computer-aided design
Year:
2002

Citing 24
Cited 5

Percolation based synthesis

DAC '90 Proceedings of the 27th ACM/IEEE Design Automation Conference
High-level synthesis: introduction to chip and system design

High-level synthesis: introduction to chip and system design
Memory estimation for high level synthesis

DAC '94 Proceedings of the 31st annual Design Automation Conference
Synthesis of application-specific memory designs

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Computational geometry: algorithms and applications

Computational geometry: algorithms and applications
Active pages: a computation model for intelligent memory

Proceedings of the 25th annual international symposium on Computer architecture
Power exploration for dynamic data types through virtual memory management refinement

ISLPED '98 Proceedings of the 1998 international symposium on Low power electronics and design
Smart Memories: a modular reconfigurable architecture

Proceedings of the 27th annual international symposium on Computer architecture
Memory binding for performance optimization of control-flow intensive behaviors

ICCAD '99 Proceedings of the 1999 IEEE/ACM international conference on Computer-aided design
High-level library mapping for memories

ACM Transactions on Design Automation of Electronic Systems (TODAES)
Exact memory size estimation for array computations

IEEE Transactions on Very Large Scale Integration (VLSI) Systems - Special issue on the 11th international symposium on system-level synthesis and design (ISSS'98)
Synthesis of hardware models in C with pointers and complex data structures

IEEE Transactions on Very Large Scale Integration (VLSI) Systems - System Level Design
Synthesis and Optimization of Digital Circuits

Synthesis and Optimization of Digital Circuits
Custom Memory Management Methodology: Exploration of Memory Organisation for Embedded Multimedia System Design

Custom Memory Management Methodology: Exploration of Memory Organisation for Embedded Multimedia System Design
Memory Issues in Embedded Systems-on-Chip: Optimizations and Exploration

Memory Issues in Embedded Systems-on-Chip: Optimizations and Exploration
Algorithmic and Register-Transfer Level Synthesis: The System Architect's Workbench

Algorithmic and Register-Transfer Level Synthesis: The System Architect's Workbench
Scalable Processors in the Billion-Transistor Era: IRAM

Computer
Automatic Partitioning of Parallel Loops and Data Arrays for Distributed Shared-Memory Multiprocessors

IEEE Transactions on Parallel and Distributed Systems
Architectural exploration for datapaths with memory hierarchy

EDTC '95 Proceedings of the 1995 European conference on Design and Test
The MIMOLA design system: Detailed description of the software system

DAC '79 Proceedings of the 16th Design Automation Conference
Behavioral Array Mapping into Multiport Memories Targeting Low Power

VLSID '97 Proceedings of the Tenth International Conference on VLSI Design: VLSI in Multimedia Applications
Automatic Computation and Data Decomposition for Multiprocessors

Automatic Computation and Data Decomposition for Multiprocessors
Improving parallelism and data locality with affine partitioning

Improving parallelism and data locality with affine partitioning
Techniques for minimizing and balancing I/O during functional partitioning

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

Synthesis of Heterogeneous Distributed Architectures for Memory-Intensive Applications

Proceedings of the 2003 IEEE/ACM international conference on Computer-aided design
MP core: algorithm and design techniques for efficient channel estimation in wireless applications

Proceedings of the 42nd annual Design Automation Conference
Transformation synthesis for data intensive applications to FPGAs

GLSVLSI '06 Proceedings of the 16th ACM Great Lakes symposium on VLSI
Generation of heterogeneous distributed architectures for memory-intensive applications through high-level synthesis

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Iterational retiming with partitioning: Loop scheduling with complete memory latency hiding

ACM Transactions on Embedded Computing Systems (TECS)

Quantified Score

Hi-index	0.00

Visualization

Abstract

With the increasing cost of global communication on-chip, high-performance designs for data-intensive applications require architectures that distribute hardware resources (computing logic, memories, interconnect, etc.) throughout a chip, while restricting computations and communications to geographic proximities. In this paper, we present a methodology for high-level synthesis (HLS) of distributed logic-memory architectures, i.e., architectures that have logic and memory distributed across several partitions in a chip. Conventional HLS tools are capable of extracting parallelism from a behavior for architectures that assume a monolithic controller/datapath communicating with a memory or memory hierarchy. This work provides techniques to extend the synthesis frontier to more general architectures that can extract both coarse- and fine-grained parallelism from data accesses and computations in a synergistic manner. Our methodology selects many possible ways of organizing data and computations, carefully examines the trade-offs (i.e., communication overheads, synchronization costs, area overheads) in choosing one solution over another, and utilizes conventional HLS techniques for intermediate steps.We have evaluated the proposed framework on several benchmarks by generating register-transfer level (RTL) implementations using an existing commercial HLS tool with and without our enhancements, and by subjecting the resulting RTL circuits to logic synthesis and layout. The results show that circuits designed as distributed logic-memory architectures using our framework achieve significant (upto, 5.31X average of 3.45X) performance improvements over well-optimized conventional designs with small area overheads (upto 19.3%, 15.1% on average).