A scalable synthesis methodology for application-specific processors

  • Authors:
  • Fei Sun;Srivaths Ravi;Anand Raghunathan;Niraj K. Jha

  • Affiliations:
  • Tensilica Inc., Santa Clara, CA;NEC Laboratories America Inc., Princeton, NJ;NEC Laboratories America Inc., Princeton, NJ;Department of Electrical Engineering, Princeton University, Princeton, NJ

  • Venue:
  • IEEE Transactions on Very Large Scale Integration (VLSI) Systems
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

Custom processors based on application-specific or domain-specific instruction sets are gaining popularity, and are often used to implement critical architectural blocks in complex systems-on-chip. While several advances have been made in the area of custom processor architectures, tools, and design methodologies, designers are still required to manually perform some critical tasks, such as selection of the custom instructions best suited to the given application and design constraints. We present a scalable methodology for the synthesis of a custom processor from an embedded software program. A key feature of the proposed methodology is its scalability, which is achieved by exploiting the structured, hierarchical nature of large software programs. We motivate the need for such a methodology, and describe the algorithms used for the critical steps, including hardware resource budgeting, local optimizations, and global exploration. Our methodology utilizes the concept of "soft" instruction templates, which can be adapted by adding operations to them or deleting operations from them at any time during the design space exploration process, allowing for global design decisions to be interleaved with fine-grained optimizations. To the best of our knowledge, this is the first work that uses the program hierarchy to derive soft instruction templates to synthesize application-specific processors for scalable applications. We have integrated our methodology in an open-source compiler, and verified it using a commercial extensible processor. Experiments with several benchmarks indicate that our methodology can effectively tackle large programs. It results in the synthesis of high-quality custom processors that demonstrate an average speedup of 2.82 × and a maximum speedup of 6.07 ×. As a side-effect, the processor energy is also reduced. The average and maximum reduction in the energy-delay product for the benchmarks are 7.64 × and 18.85 ×, respectively. The CPU times required for custom processor synthesis are quite small, indicating that the proposed techniques can be applied to embedded software programs of significant complexity.