Weld: A Multithreading Technique Towards Latency-Tolerant VLIW Processors

Authors:
Emre Özer;Thomas M. Conte;Saurabh Sharma
Affiliations:
-;-;-
Venue:
HiPC '01 Proceedings of the 8th International Conference on High Performance Computing
Year:
2001

Citing 13
Cited 9

Implementing Precise Interrupts in Pipelined Processors

IEEE Transactions on Computers
A variable instruction stream extension to the VLIW architecture

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
The expandable split window paradigm for exploiting fine-grain parallelsim

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Processor coupling: integrating compile time and runtime scheduling for parallelism

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Simultaneous multithreading: maximizing on-chip parallelism

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Multiscalar processors

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Single-program speculative multithreading (SPSM) architecture: compiler-assisted fine-grained multithreading

PACT '95 Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques
The M-Machine multicomputer

Proceedings of the 28th annual international symposium on Microarchitecture
ARB: A Hardware Mechanism for Dynamic Reordering of Memory References

IEEE Transactions on Computers
Dynamically scheduled VLIW processors

MICRO 26 Proceedings of the 26th annual international symposium on Microarchitecture
Threaded multiple path execution

Proceedings of the 25th annual international symposium on Computer architecture
A dynamic multithreading processor

MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Treegion Scheduling for Wide Issue Processors

HPCA '98 Proceedings of the 4th International Symposium on High-Performance Computer Architecture

Improving quasi-dynamic schedules through region slip

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Extended Split-Issue: Enabling Flexibility in the Hardware Implementation of NUAL VLIW DSPs

Proceedings of the 31st annual international symposium on Computer architecture
Virtual multiprocessor: an analyzable, high-performance architecture for real-time computing

Proceedings of the 2005 international conference on Compilers, architectures and synthesis for embedded systems
High-Performance and Low-Cost Dual-Thread VLIW Processor Using Weld Architecture Paradigm

IEEE Transactions on Parallel and Distributed Systems
Distributed loop controller architecture for multi-threading in uni-threaded VLIW processors

Proceedings of the conference on Design, automation and test in Europe: Proceedings
Multiple Instruction Stream Processor

Proceedings of the 33rd annual international symposium on Computer Architecture
Dual-thread speculation: a simple approach to uncover thread-level parallelism on a simultaneous multithreaded processor

International Journal of Parallel Programming
Dynamic instruction scheduling in a trace-based multi-threaded architecture

International Journal of Parallel Programming
Software simultaneous multi-threading, a technique to exploit task-level parallelism to improve instruction- and data-level parallelism

PATMOS'06 Proceedings of the 16th international conference on Integrated Circuit and System Design: power and Timing Modeling, Optimization and Simulation

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents a new architecture model, named Weld, for VLIW processors. Weld integrates multithreading support into a VLIW processor to hide run-time latency effects that cannot be determined by the compiler. It does this through a novel hardware technique called operation welding that merges operations from different threads to utilize the hardware resources more efficiently. Hardware contexts such as program counters and fetch units are duplicated to support multithreading. The experimental results show that the Weld architecture attains a maximum of 27% speedup as compared to a single-threaded VLIW architecture.