A co-designed virtual machine for instruction-level distributed processing

  • Authors:
  • Ho-Seop Kim;James E. Smith

  • Affiliations:
  • The University of Wisconsin - Madison;The University of Wisconsin - Madison

  • Venue:
  • A co-designed virtual machine for instruction-level distributed processing
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

A current trend in high-performance superscalar processors is toward simpler designs that attempt to strike a balance between clock frequency, instruction-level parallelism, and power consumption. To achieve this goal, the thesis advocates a microarchitecture and design paradigm that rely less on low-level speculation techniques and more on simpler, modular designs with distributed processing at the instruction level, i.e., instruction-level distributed processing (ILDP). This thesis shows that designing a hardware/software co-designed virtual machine (VM) system using an accumulator-oriented instruction set architecture (ISA) and microarchitecture is a good approach for implementing complexity-effective, high-performance out-of-order superscalar machines. The following three key points support this conclusion. An accumulator-oriented instruction format and microarchitecture fit today's technology constraints better than conventional design approaches: The ILDP ISA format assigns temporary values that account for most of the register communication to a small number of accumulators. As a result, the complexity of the register file and associated hardware structures are greatly reduced. Furthermore, the dependence-oriented ILDP ISA format allows simple implementation of a complexity-effective distributed microarchitecture that is tolerant of global communication latencies. The accumulator-oriented instruction format and microarchitecture result in low-overhead dynamic binary translation (DBT): Because the underlying ILDP hardware provides a form of superscalar out-of-order processing, the dynamic binary translator does not need to perform aggressive optimizations. As a result, the dynamic binary translation overhead is greatly reduced. The co-designed VM system for ILDP performs similarly to, or better than, conventional superscalar processors having similar pipeline depths while achieving lower complexity in key pipeline structures: This reduction of complexity can be exploited to achieve either a higher clock frequency or lower power consumption, or a combination of the two. This thesis makes two main contributions. First, the major components of a co-designed VM for ILDP are fully developed: an accumulator-based ISA; a complexity-effective distributed microarchitecture; a fast and efficient DBT mechanism. Second, performance evaluations and complexity analysis support the key points of the thesis listed above.