High-level synthesis using computation-unit integrated memories

Authors:
Chao Huang;S. Ravi;A. Raghunathan;N. K. Jha
Affiliations:
Dept. of Electr. Eng., Princeton Univ., NJ, USA;NEC Laboratories America, Princeton, NJ, USA;NEC Laboratories America, Princeton, NJ, USA;Dept. of Electr. Eng., Princeton Univ., NJ, USA
Venue:
Proceedings of the 2004 IEEE/ACM International conference on Computer-aided design
Year:
2004

Citing 28
Cited 0

Synthesis of application-specific memory designs

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Fast and extensive system-level memory exploration for ATM applications

ISSS '97 Proceedings of the 10th international symposium on System synthesis
Active pages: a computation model for intelligent memory

Proceedings of the 25th annual international symposium on Computer architecture
C-based synthesis experiences with a behavior synthesizer, “cyber”

DATE '99 Proceedings of the conference on Design, automation and test in Europe
Mapping irregular applications to DIVA, a PIM-based data-intensive architecture

SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
Smart Memories: a modular reconfigurable architecture

Proceedings of the 27th annual international symposium on Computer architecture
Memory binding for performance optimization of control-flow intensive behaviors

ICCAD '99 Proceedings of the 1999 IEEE/ACM international conference on Computer-aided design
High-level library mapping for memories

ACM Transactions on Design Automation of Electronic Systems (TODAES)
Exact memory size estimation for array computations

IEEE Transactions on Very Large Scale Integration (VLSI) Systems - Special issue on the 11th international symposium on system-level synthesis and design (ISSS'98)
From architecture to layout: partitioned memory synthesis for embedded systems-on-chip

Proceedings of the 38th annual Design Automation Conference
Compiler Support for Scalable and Efficient Memory Systems

IEEE Transactions on Computers
Automatic Code Mapping on an Intelligent Memory Architecture

IEEE Transactions on Computers
An integrated algorithm for memory allocation and assignment in high-level synthesis

Proceedings of the 39th annual Design Automation Conference
Algorithms in C: Parts 1-4, Fundamentals, Data Structures, Sorting, and Searching

Algorithms in C: Parts 1-4, Fundamentals, Data Structures, Sorting, and Searching
Genetic Algorithms in Search, Optimization and Machine Learning

Genetic Algorithms in Search, Optimization and Machine Learning
High Performance Compilers for Parallel Computing

High Performance Compilers for Parallel Computing
Processing in Memory: The Terasys Massively Parallel PIM Array

Computer
Maximizing Multiprocessor Performance with the SUIF Compiler

Computer
Scalable Processors in the Billion-Transistor Era: IRAM

Computer
Code Transformations for Data Transfer and Storage Exploration Preprocessing in Multimedia Processors

IEEE Design & Test
Embedded intelligent SRAM

Proceedings of the 40th annual Design Automation Conference
Architectural exploration for datapaths with memory hierarchy

EDTC '95 Proceedings of the 1995 European conference on Design and Test
The MIMOLA design system: Detailed description of the software system

DAC '79 Proceedings of the 16th Design Automation Conference
Impulse: Building a Smarter Memory Controller

HPCA '99 Proceedings of the 5th International Symposium on High Performance Computer Architecture
Merged Dram-Logic In The Year 2001

MTDT '98 Proceedings of the 1998 IEEE International Workshop on Memory Technology, Design and Testing
FlexRAM: Toward an Advanced Intelligent Memory System

ICCD '99 Proceedings of the 1999 IEEE International Conference on Computer Design
Synthesis of Heterogeneous Distributed Architectures for Memory-Intensive Applications

Proceedings of the 2003 IEEE/ACM international conference on Computer-aided design
Data dependency size estimation for use in memory optimization

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

High-level synthesis (HLS) of memory-intensive applications has featured several innovations in terms of enhancements made to the basic memory organization and data layout. However, increasing performance and energy demands faced by application-specific integrated circuits (ASIC) are forcing designers to alter the fundamental architectural template of the HLS output, namely, a controller-datapath associated with a memory subsystem (monolithic, banked, etc.). We propose an architectural template for the HLS output that consists of a controller-datapath circuit associated with a memory subsystem into which computation units have been integrated. The enhanced memory subsystem is called computation-unit integrated memory (CIM). A CIM offers higher memory bandwidth (relative to what is offered through the system bus) to computation units present locally within it and reduces the overall communication between the memory subsystem and the controller-datapath, thus providing a template highly suitable for deriving efficient implementations of memory-intensive applications. This work addresses the challenge of providing an automatic synthesis framework for a CIM-based architecture. Our framework can analyze the various trade-offs involved in selecting suitable operations in a behavior for execution using a CIM and generate a high-performance, low-overhead implementation. Experiments with several behaviors indicate that an average performance improvement of 1.88/spl times/ (a maximum of 2.63/spl times/) is possible with very low area overheads. The energy-delay product improves by an average of 2.1/spl times/ (maximum of 3.4/spl times/).