A co-designed virtual machine for instruction-level distributed processing

Authors:
Ho-Seop Kim;James E. Smith
Affiliations:
The University of Wisconsin - Madison;The University of Wisconsin - Madison
Venue:
A co-designed virtual machine for instruction-level distributed processing
Year:
2004

Citing 0
Cited 2

Efficient binary translation system with low hardware cost

ICCD'09 Proceedings of the 2009 IEEE international conference on Computer design
A novel chaining approach to indirect control transfer instructions

ARES'11 Proceedings of the IFIP WG 8.4/8.9 international cross domain conference on Availability, reliability and security for business, enterprise and health information systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

A current trend in high-performance superscalar processors is toward simpler designs that attempt to strike a balance between clock frequency, instruction-level parallelism, and power consumption. To achieve this goal, the thesis advocates a microarchitecture and design paradigm that rely less on low-level speculation techniques and more on simpler, modular designs with distributed processing at the instruction level, i.e., instruction-level distributed processing (ILDP). This thesis shows that designing a hardware/software co-designed virtual machine (VM) system using an accumulator-oriented instruction set architecture (ISA) and microarchitecture is a good approach for implementing complexity-effective, high-performance out-of-order superscalar machines. The following three key points support this conclusion. An accumulator-oriented instruction format and microarchitecture fit today's technology constraints better than conventional design approaches: The ILDP ISA format assigns temporary values that account for most of the register communication to a small number of accumulators. As a result, the complexity of the register file and associated hardware structures are greatly reduced. Furthermore, the dependence-oriented ILDP ISA format allows simple implementation of a complexity-effective distributed microarchitecture that is tolerant of global communication latencies. The accumulator-oriented instruction format and microarchitecture result in low-overhead dynamic binary translation (DBT): Because the underlying ILDP hardware provides a form of superscalar out-of-order processing, the dynamic binary translator does not need to perform aggressive optimizations. As a result, the dynamic binary translation overhead is greatly reduced. The co-designed VM system for ILDP performs similarly to, or better than, conventional superscalar processors having similar pipeline depths while achieving lower complexity in key pipeline structures: This reduction of complexity can be exploited to achieve either a higher clock frequency or lower power consumption, or a combination of the two. This thesis makes two main contributions. First, the major components of a co-designed VM for ILDP are fully developed: an accumulator-based ISA; a complexity-effective distributed microarchitecture; a fast and efficient DBT mechanism. Second, performance evaluations and complexity analysis support the key points of the thesis listed above.