Dynamic instruction scheduling in a trace-based multi-threaded architecture

Authors:
Peter A. Rounce;Alberto F. De Souza
Affiliations:
Department of Computer Science, University College London, London, UK;Departamento de Informática, Universidade Federal do Espírito Santo, Vitoria, ES, Brazil
Venue:
International Journal of Parallel Programming
Year:
2008

Citing 14
Cited 0

The superblock: an effective technique for VLIW and superscalar compilation

The Journal of Supercomputing - Special issue on instruction-level parallelism
Simultaneous multithreading: maximizing on-chip parallelism

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Dynamically scheduled VLIW processors

MICRO 26 Proceedings of the 26th annual international symposium on Microarchitecture
Exploiting instruction level parallelism in processors by caching scheduled groups

Proceedings of the 24th annual international symposium on Computer architecture
Dynamically scheduling VLIW instructions

Journal of Parallel and Distributed Computing
EPIC: Explicitly Parallel Instruction Computing

Computer
Simultaneous Multithreading: A Platform for Next-Generation Processors

IEEE Micro
Weld: A Multithreading Technique Towards Latency-Tolerant VLIW Processors

HiPC '01 Proceedings of the 8th International Conference on High Performance Computing
Dynamically Trace Scheduled VLIW Architectures

HPCN Europe 1998 Proceedings of the International Conference and Exhibition on High-Performance Computing and Networking
Improving quasi-dynamic schedules through region slip

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
The Future of Microprocessors

Queue - Multiprocessors
High-Performance and Low-Cost Dual-Thread VLIW Processor Using Weld Architecture Paradigm

IEEE Transactions on Parallel and Distributed Systems
The mDTSVLIW: a Multi-Threaded Trace-based VLIW Architecture

SBAC-PAD '06 Proceedings of the 18th International Symposium on Computer Architecture and High Performance Computing
The VLIW Machine: A Multiprocessor for Compiling Scientific Code

Computer

Quantified Score

Hi-index	0.00

Visualization

Abstract

Simulation results are presented using the hardware-implemented, trace-based dynamic instruction scheduler of our single process DTSVLIW architecture to schedule instructions from several processes into multiple streams of VLIW instructions for execution by a wide-issue, simultaneous multi-threading (SMT) execution engine. The scheduling process involves single instruction execution of each process, dynamically scheduling executed instructions into blocks of VLIW instructions cached for subsequent SMT execution: SMT provides a mechanism to reduce the impact of horizontal and vertical waste, and variable memory latencies, seen in the DTSVLIW. Preliminary experiments explore this extended model. Results achieve PE utilization of up to 87% on a 4-thread, 1-scalar, 8 PE design, with speed-ups of up to 6.3 that of a single processor. Noticeably it only needs a single scalar process to be scheduled at any time, with main memory fetches being 1-4% that of a single processor.