High-Performance and Low-Cost Dual-Thread VLIW Processor Using Weld Architecture Paradigm

Authors:
Emre Ozer;Thomas M. Conte
Affiliations:
-;IEEE
Venue:
IEEE Transactions on Parallel and Distributed Systems
Year:
2005

Citing 20
Cited 2

Checkpoint repair for high-performance out-of-order execution machines

IEEE Transactions on Computers
A variable instruction stream extension to the VLIW architecture

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
IMPACT: an architectural framework for multiple-instruction-issue processors

ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Processor coupling: integrating compile time and runtime scheduling for parallelism

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
The anatomy of the register file in a multiscalar processor

MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Simultaneous multithreading: maximizing on-chip parallelism

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Multiscalar processors

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Single-program speculative multithreading (SPSM) architecture: compiler-assisted fine-grained multithreading

PACT '95 Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques
The M-Machine multicomputer

Proceedings of the 28th annual international symposium on Microarchitecture
ARB: A Hardware Mechanism for Dynamic Reordering of Memory References

IEEE Transactions on Computers
Dynamically scheduled VLIW processors

MICRO 26 Proceedings of the 26th annual international symposium on Microarchitecture
Threaded multiple path execution

Proceedings of the 25th annual international symposium on Computer architecture
A scalable approach to thread-level speculation

Proceedings of the 27th annual international symposium on Computer architecture
Lx: a technology platform for customizable VLIW embedded processing

Proceedings of the 27th annual international symposium on Computer architecture
Slipstream processors: improving both performance and fault tolerance

ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Dynamically allocating processor resources between nearby and distant ILP

ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Tolerating memory latency through software-controlled pre-execution in simultaneous multithreading processors

ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Weld: A Multithreading Technique Towards Latency-Tolerant VLIW Processors

HiPC '01 Proceedings of the 8th International Conference on High Performance Computing
Treegion Scheduling for Wide Issue Processors

HPCA '98 Proceedings of the 4th International Symposium on High-Performance Computer Architecture
Speculative Data-Driven Multithreading

HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture

Mt-ADRES: multithreading on coarse-grained reconfigurable architecture

ARC'07 Proceedings of the 3rd international conference on Reconfigurable computing: architectures, tools and applications
Dynamic instruction scheduling in a trace-based multi-threaded architecture

International Journal of Parallel Programming

Quantified Score

Hi-index	0.01

Visualization

Abstract

This paper presents a cost-effective and high-performance dual-thread VLIW processor model. The dual-thread VLIW processor model is a low-cost subset of the Weld architecture paradigm. It supports one main thread and one speculative thread running simultaneously in a VLIW processor with a register file and a fetch unit per thread along with memory disambiguation hardware for speculative load and store operations. This paper analyzes the performance impact of the dual-thread VLIW processor, which includes analysis of migrating disambiguation hardware for speculative load operations to the compiler and of the sensitivity of the model to the variation of branch misprediction, second-level cache miss penalties, and register file copy time. Up to 34 percent improvement in performance can be attained using the dual-thread VLIW processor when compared to a single-threaded VLIW processor model.