Synthesis of Heterogeneous Distributed Architectures for Memory-Intensive Applications

  • Authors:
  • Chao Huang;Srivaths Ravi;Anand Raghunathan;Niraj K. Jha

  • Affiliations:
  • Princeton University, NJ;NEC Laboratories America, Princeton, NJ;NEC Laboratories America, Princeton, NJ;Princeton University, NJ

  • Venue:
  • Proceedings of the 2003 IEEE/ACM international conference on Computer-aided design
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

Memory-intensive applications present unique challenges to an ASICdesigner in terms of the choice of memory organization, memory size requirements,bandwidth and access latencies, etc. The high potential of single-chip distributed logic-memory architectures in addressing many of these issues has been recognized ingeneral-purpose computing, and more recently in ASIC design. However, such architectureswill be adopted widely by designers only when general techniques and toolsfor efficient high-level synthesis (HLS) of multi-partitioned ASICs become available.The techniques presented in this paper are motivated by the fact that many memory-intensiveapplications exhibit irregular array data access patterns (due to conditionalsin loop nests, etc.). Synthesis should, therefore, be capable of determining a partitionedarchitecture, wherein array data and computations may have to be heterogeneouslydistributed for achieving the best performance speedup. Furthermore, the synthesismethodology should not be restricted by the nature of array index functions (affine orotherwise) in a behavior. Therefore, our methodology employs simulation to provideinformation about the access patterns of array data references in a behavior, which isused by the rest of our analysis. We use a combination of clustering and min-cut stylepartitioning techniques to partition the behavior into sub-behaviors while consideringvarious factors including data access locality, balanced workloads, inter-partitioncommunication, etc. Finally, we also employ an iterative improvement strategy to determinethe best way of distributing array data into physical memory in each partition.Our experiments with several benchmark applications show that the proposed techniquescan yield partitioned architectures that can achieve upto 2.2X performancespeed-up over conventional HLS solutions, while achieving upto 1.6X performancespeedup over the best homogeneous partitioning solution feasible.