Synthesis of Heterogeneous Distributed Architectures for Memory-Intensive Applications

Authors:
Chao Huang;Srivaths Ravi;Anand Raghunathan;Niraj K. Jha
Affiliations:
Princeton University, NJ;NEC Laboratories America, Princeton, NJ;NEC Laboratories America, Princeton, NJ;Princeton University, NJ
Venue:
Proceedings of the 2003 IEEE/ACM international conference on Computer-aided design
Year:
2003

Citing 21
Cited 3

Combinatorial algorithms for integrated circuit layout

Combinatorial algorithms for integrated circuit layout
Digital image processing

Digital image processing
Synthesis of application-specific memory designs

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Active pages: a computation model for intelligent memory

Proceedings of the 25th annual international symposium on Computer architecture
Automatic storage management for parallel programs

Parallel Computing - Special issues on languages and compilers for parallel computers
Data clustering: a review

ACM Computing Surveys (CSUR)
Smart Memories: a modular reconfigurable architecture

Proceedings of the 27th annual international symposium on Computer architecture
High-level library mapping for memories

ACM Transactions on Design Automation of Electronic Systems (TODAES)
Compiler Support for Scalable and Efficient Memory Systems

IEEE Transactions on Computers
Automatic Code Mapping on an Intelligent Memory Architecture

IEEE Transactions on Computers
Dependence Analysis for Supercomputing

Dependence Analysis for Supercomputing
Custom Memory Management Methodology: Exploration of Memory Organisation for Embedded Multimedia System Design

Custom Memory Management Methodology: Exploration of Memory Organisation for Embedded Multimedia System Design
Memory Issues in Embedded Systems-on-Chip: Optimizations and Exploration

Memory Issues in Embedded Systems-on-Chip: Optimizations and Exploration
High Performance Compilers for Parallel Computing

High Performance Compilers for Parallel Computing
Energy-aware design of embedded memories: A survey of technologies, architectures, and optimization techniques

ACM Transactions on Embedded Computing Systems (TECS)
Scalable Processors in the Billion-Transistor Era: IRAM

Computer
Automatic Partitioning of Parallel Loops and Data Arrays for Distributed Shared-Memory Multiprocessors

IEEE Transactions on Parallel and Distributed Systems
High-level synthesis of distributed logic-memory architectures

Proceedings of the 2002 IEEE/ACM international conference on Computer-aided design
Embedded intelligent SRAM

Proceedings of the 40th annual Design Automation Conference
Architectural exploration for datapaths with memory hierarchy

EDTC '95 Proceedings of the 1995 European conference on Design and Test
The MIMOLA design system: Detailed description of the software system

DAC '79 Proceedings of the 16th Design Automation Conference

High-level synthesis using computation-unit integrated memories

Proceedings of the 2004 IEEE/ACM International conference on Computer-aided design
Design space exploration and data memory architecture design for a hybrid nano/CMOS dynamically reconfigurable architecture

ACM Journal on Emerging Technologies in Computing Systems (JETC)
Low-power 3D nano/CMOS hybrid dynamically reconfigurable architecture

ACM Journal on Emerging Technologies in Computing Systems (JETC)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Memory-intensive applications present unique challenges to an ASICdesigner in terms of the choice of memory organization, memory size requirements,bandwidth and access latencies, etc. The high potential of single-chip distributed logic-memory architectures in addressing many of these issues has been recognized ingeneral-purpose computing, and more recently in ASIC design. However, such architectureswill be adopted widely by designers only when general techniques and toolsfor efficient high-level synthesis (HLS) of multi-partitioned ASICs become available.The techniques presented in this paper are motivated by the fact that many memory-intensiveapplications exhibit irregular array data access patterns (due to conditionalsin loop nests, etc.). Synthesis should, therefore, be capable of determining a partitionedarchitecture, wherein array data and computations may have to be heterogeneouslydistributed for achieving the best performance speedup. Furthermore, the synthesismethodology should not be restricted by the nature of array index functions (affine orotherwise) in a behavior. Therefore, our methodology employs simulation to provideinformation about the access patterns of array data references in a behavior, which isused by the rest of our analysis. We use a combination of clustering and min-cut stylepartitioning techniques to partition the behavior into sub-behaviors while consideringvarious factors including data access locality, balanced workloads, inter-partitioncommunication, etc. Finally, we also employ an iterative improvement strategy to determinethe best way of distributing array data into physical memory in each partition.Our experiments with several benchmark applications show that the proposed techniquescan yield partitioned architectures that can achieve upto 2.2X performancespeed-up over conventional HLS solutions, while achieving upto 1.6X performancespeedup over the best homogeneous partitioning solution feasible.