Generation of heterogeneous distributed architectures for memory-intensive applications through high-level synthesis

  • Authors:
  • Chao Huang;Srivaths Ravi;Anand Raghunathan;Niraj K. Jha

  • Affiliations:
  • Bradley Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University, Blacksburg, VA;DSPS Design Team, Texas Instruments, Bangalore, India;NEC Laboratories America, Princeton, NJ;Department of Electrical Engineering, Princeton University, Princeton, NJ

  • Venue:
  • IEEE Transactions on Very Large Scale Integration (VLSI) Systems
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Memory-intensive applications present unique chal lenges to an application-specific integrated circuit (ASIC) designer in terms of the choice of memory organization, memory size requirements, bandwidth and access latencies, etc. The high potential of single-chip distributed logic-memory architectures in addressing many of these issues has been recognized in general-purpose computing, and more recently, in ASIC design. The high-level synthesis (HLS) techniques presented in this paper are motivated by the fact that many memory-intensive applications exhibit irregular array data access patterns. Synthesis should therefore, be capable of determining a partitioned architecture wherein array data and computations may have to be heterogeaeously distributed for achieving the best performance speed-up We use a combination of clustering and min-cut style partitioning Lechniques to yield distributed architectures, based on simulation profiling while considering various factors including data access, locality, balanced workloads, inter-partition communication, etc. Our experiments with several benchmark applications show that the proposed techniques yielded two-way partitioned architectures that can achieve upto 2.1 × (average of 1.9 ×) performance speed-up over conventional HLS solutions, while achieving upto 1.5× (average of 1.4×) performance speed-up over the best homogeneous partitioning solution feasible. At the same time the reduction in the energy-delay product over conventional single-memory designs is upto 2.7× (average of 2.0 ×). A large amount of partitioning makes further system performance improvement achievable at the cost of chip area.