Combinatorial algorithms for integrated circuit layout
Combinatorial algorithms for integrated circuit layout
Digital image processing
Synthesis of application-specific memory designs
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Active pages: a computation model for intelligent memory
Proceedings of the 25th annual international symposium on Computer architecture
Automatic storage management for parallel programs
Parallel Computing - Special issues on languages and compilers for parallel computers
ACM Computing Surveys (CSUR)
Smart Memories: a modular reconfigurable architecture
Proceedings of the 27th annual international symposium on Computer architecture
High-level library mapping for memories
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Compiler Support for Scalable and Efficient Memory Systems
IEEE Transactions on Computers
Automatic Code Mapping on an Intelligent Memory Architecture
IEEE Transactions on Computers
Dependence Analysis for Supercomputing
Dependence Analysis for Supercomputing
Custom Memory Management Methodology: Exploration of Memory Organisation for Embedded Multimedia System Design
Memory Issues in Embedded Systems-on-Chip: Optimizations and Exploration
Memory Issues in Embedded Systems-on-Chip: Optimizations and Exploration
High Performance Compilers for Parallel Computing
High Performance Compilers for Parallel Computing
ACM Transactions on Embedded Computing Systems (TECS)
IEEE Transactions on Parallel and Distributed Systems
High-level synthesis of distributed logic-memory architectures
Proceedings of the 2002 IEEE/ACM international conference on Computer-aided design
Proceedings of the 40th annual Design Automation Conference
Architectural exploration for datapaths with memory hierarchy
EDTC '95 Proceedings of the 1995 European conference on Design and Test
The MIMOLA design system: Detailed description of the software system
DAC '79 Proceedings of the 16th Design Automation Conference
High-level synthesis using computation-unit integrated memories
Proceedings of the 2004 IEEE/ACM International conference on Computer-aided design
ACM Journal on Emerging Technologies in Computing Systems (JETC)
Low-power 3D nano/CMOS hybrid dynamically reconfigurable architecture
ACM Journal on Emerging Technologies in Computing Systems (JETC)
Hi-index | 0.00 |
Memory-intensive applications present unique challenges to an ASICdesigner in terms of the choice of memory organization, memory size requirements,bandwidth and access latencies, etc. The high potential of single-chip distributed logic-memory architectures in addressing many of these issues has been recognized ingeneral-purpose computing, and more recently in ASIC design. However, such architectureswill be adopted widely by designers only when general techniques and toolsfor efficient high-level synthesis (HLS) of multi-partitioned ASICs become available.The techniques presented in this paper are motivated by the fact that many memory-intensiveapplications exhibit irregular array data access patterns (due to conditionalsin loop nests, etc.). Synthesis should, therefore, be capable of determining a partitionedarchitecture, wherein array data and computations may have to be heterogeneouslydistributed for achieving the best performance speedup. Furthermore, the synthesismethodology should not be restricted by the nature of array index functions (affine orotherwise) in a behavior. Therefore, our methodology employs simulation to provideinformation about the access patterns of array data references in a behavior, which isused by the rest of our analysis. We use a combination of clustering and min-cut stylepartitioning techniques to partition the behavior into sub-behaviors while consideringvarious factors including data access locality, balanced workloads, inter-partitioncommunication, etc. Finally, we also employ an iterative improvement strategy to determinethe best way of distributing array data into physical memory in each partition.Our experiments with several benchmark applications show that the proposed techniquescan yield partitioned architectures that can achieve upto 2.2X performancespeed-up over conventional HLS solutions, while achieving upto 1.6X performancespeedup over the best homogeneous partitioning solution feasible.