DAC '90 Proceedings of the 27th ACM/IEEE Design Automation Conference
High-level synthesis: introduction to chip and system design
High-level synthesis: introduction to chip and system design
Memory estimation for high level synthesis
DAC '94 Proceedings of the 31st annual Design Automation Conference
Synthesis of application-specific memory designs
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Computational geometry: algorithms and applications
Computational geometry: algorithms and applications
Active pages: a computation model for intelligent memory
Proceedings of the 25th annual international symposium on Computer architecture
Power exploration for dynamic data types through virtual memory management refinement
ISLPED '98 Proceedings of the 1998 international symposium on Low power electronics and design
Smart Memories: a modular reconfigurable architecture
Proceedings of the 27th annual international symposium on Computer architecture
Memory binding for performance optimization of control-flow intensive behaviors
ICCAD '99 Proceedings of the 1999 IEEE/ACM international conference on Computer-aided design
High-level library mapping for memories
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Exact memory size estimation for array computations
IEEE Transactions on Very Large Scale Integration (VLSI) Systems - Special issue on the 11th international symposium on system-level synthesis and design (ISSS'98)
Synthesis of hardware models in C with pointers and complex data structures
IEEE Transactions on Very Large Scale Integration (VLSI) Systems - System Level Design
Synthesis and Optimization of Digital Circuits
Synthesis and Optimization of Digital Circuits
Custom Memory Management Methodology: Exploration of Memory Organisation for Embedded Multimedia System Design
Memory Issues in Embedded Systems-on-Chip: Optimizations and Exploration
Memory Issues in Embedded Systems-on-Chip: Optimizations and Exploration
Algorithmic and Register-Transfer Level Synthesis: The System Architect's Workbench
Algorithmic and Register-Transfer Level Synthesis: The System Architect's Workbench
IEEE Transactions on Parallel and Distributed Systems
Architectural exploration for datapaths with memory hierarchy
EDTC '95 Proceedings of the 1995 European conference on Design and Test
The MIMOLA design system: Detailed description of the software system
DAC '79 Proceedings of the 16th Design Automation Conference
Behavioral Array Mapping into Multiport Memories Targeting Low Power
VLSID '97 Proceedings of the Tenth International Conference on VLSI Design: VLSI in Multimedia Applications
Automatic Computation and Data Decomposition for Multiprocessors
Automatic Computation and Data Decomposition for Multiprocessors
Improving parallelism and data locality with affine partitioning
Improving parallelism and data locality with affine partitioning
Techniques for minimizing and balancing I/O during functional partitioning
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Synthesis of Heterogeneous Distributed Architectures for Memory-Intensive Applications
Proceedings of the 2003 IEEE/ACM international conference on Computer-aided design
MP core: algorithm and design techniques for efficient channel estimation in wireless applications
Proceedings of the 42nd annual Design Automation Conference
Transformation synthesis for data intensive applications to FPGAs
GLSVLSI '06 Proceedings of the 16th ACM Great Lakes symposium on VLSI
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Iterational retiming with partitioning: Loop scheduling with complete memory latency hiding
ACM Transactions on Embedded Computing Systems (TECS)
Hi-index | 0.00 |
With the increasing cost of global communication on-chip, high-performance designs for data-intensive applications require architectures that distribute hardware resources (computing logic, memories, interconnect, etc.) throughout a chip, while restricting computations and communications to geographic proximities. In this paper, we present a methodology for high-level synthesis (HLS) of distributed logic-memory architectures, i.e., architectures that have logic and memory distributed across several partitions in a chip. Conventional HLS tools are capable of extracting parallelism from a behavior for architectures that assume a monolithic controller/datapath communicating with a memory or memory hierarchy. This work provides techniques to extend the synthesis frontier to more general architectures that can extract both coarse- and fine-grained parallelism from data accesses and computations in a synergistic manner. Our methodology selects many possible ways of organizing data and computations, carefully examines the trade-offs (i.e., communication overheads, synchronization costs, area overheads) in choosing one solution over another, and utilizes conventional HLS techniques for intermediate steps.We have evaluated the proposed framework on several benchmarks by generating register-transfer level (RTL) implementations using an existing commercial HLS tool with and without our enhancements, and by subjecting the resulting RTL circuits to logic synthesis and layout. The results show that circuits designed as distributed logic-memory architectures using our framework achieve significant (upto, 5.31X average of 3.45X) performance improvements over well-optimized conventional designs with small area overheads (upto 19.3%, 15.1% on average).