Analysis of execution efficiency in the microthreaded processor UTLEON3

Authors:
Jaroslav Sykora;Leos Kafka;Martin Danek;Lukas Kohout
Affiliations:
Institute of Information Theory and Automation of the ASCR, Department of Signal Processing, Prague, Czech Republic;Institute of Information Theory and Automation of the ASCR, Department of Signal Processing, Prague, Czech Republic;Institute of Information Theory and Automation of the ASCR, Department of Signal Processing, Prague, Czech Republic;Institute of Information Theory and Automation of the ASCR, Department of Signal Processing, Prague, Czech Republic
Venue:
ARCS'11 Proceedings of the 24th international conference on Architecture of computing systems
Year:
2011

Citing 14
Cited 2

Executing a Program on the MIT Tagged-Token Dataflow Architecture

IEEE Transactions on Computers
Register relocation: flexible contexts for multithreading

ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Improving Latency Tolerance of Multithreading through Decoupling

IEEE Transactions on Computers
Sparcle: An Evolutionary Processor Design for Large-Scale Multiprocessors

IEEE Micro
A survey of processors with explicit multithreading

ACM Computing Surveys (CSUR)
MSparc: A Multithreaded Sparc

Euro-Par '96 Proceedings of the Second International Euro-Par Conference on Parallel Processing-Volume II
Mini-Threads: Increasing TLP on Small-Scale SMT Processors

HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Beating in-order stalls with "flea-flicker" two-pass pipelining

Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Decoupled Software Pipelining with the Synchronization Array

Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
Niagara: A 32-Way Multithreaded Sparc Processor

IEEE Micro
On the potential of latency tolerant execution in speculative multithreading

IFMT '08 Proceedings of the 1st international forum on Next-generation multicore/manycore technologies
Many-Core vs. Many-Thread Machines: Stay Away From the Valley

IEEE Computer Architecture Letters
MIPS MT: a multithreaded RISC architecture for embedded real-time processing

HiPEAC'08 Proceedings of the 3rd international conference on High performance embedded architectures and compilers
µTC: an intermediate language for programming chip multiprocessors

ACSAC'06 Proceedings of the 11th Asia-Pacific conference on Advances in Computer Systems Architecture

Heterogeneous integration to simplify many-core architecture simulations

Proceedings of the 2012 Workshop on Rapid Simulation and Performance Evaluation: Methods and Tools
Apple-CORE: Harnessing general-purpose many-cores with hardware concurrency management

Microprocessors & Microsystems

Quantified Score

Hi-index	0.00

Visualization

Abstract

We analyse an impact of long-latency instructions, the family blocksize parameter, and the thread switch modifier on execution efficiency of families of threads in a single-core configuration of the UTLEON3 processor that implements the SVP microthreading model. The analysis is supported by code execution in an FPGA implementation of the processor. By classifying long-latency operations as either pipelined (e.g. floatingpoint operations) or non-pipelined (e.g. cache faults) we show that the blocksize parameter that controls resource utilization in the microthreaded processor has profound effects when the latency is pipelined, i.e. increasing the blocksize can improve the performance. In the nonpipelined long-latency case the efficiency reaches its maximum even with a small value of blocksize beyond which it cannot improve due to occupancy of an exclusive resource (memory bus congestion). The conclusions drawn in this paper can be used to optimize code compilation for the microthreaded processor. As the compiler specifies the blocksize parameter for each family of threads individually, it can optimize the register file utilization of the processor.