Executing a Program on the MIT Tagged-Token Dataflow Architecture
IEEE Transactions on Computers
Register relocation: flexible contexts for multithreading
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Improving Latency Tolerance of Multithreading through Decoupling
IEEE Transactions on Computers
A survey of processors with explicit multithreading
ACM Computing Surveys (CSUR)
Euro-Par '96 Proceedings of the Second International Euro-Par Conference on Parallel Processing-Volume II
Mini-Threads: Increasing TLP on Small-Scale SMT Processors
HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Beating in-order stalls with "flea-flicker" two-pass pipelining
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Decoupled Software Pipelining with the Synchronization Array
Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
On the potential of latency tolerant execution in speculative multithreading
IFMT '08 Proceedings of the 1st international forum on Next-generation multicore/manycore technologies
Many-Core vs. Many-Thread Machines: Stay Away From the Valley
IEEE Computer Architecture Letters
MIPS MT: a multithreaded RISC architecture for embedded real-time processing
HiPEAC'08 Proceedings of the 3rd international conference on High performance embedded architectures and compilers
µTC: an intermediate language for programming chip multiprocessors
ACSAC'06 Proceedings of the 11th Asia-Pacific conference on Advances in Computer Systems Architecture
Heterogeneous integration to simplify many-core architecture simulations
Proceedings of the 2012 Workshop on Rapid Simulation and Performance Evaluation: Methods and Tools
Apple-CORE: Harnessing general-purpose many-cores with hardware concurrency management
Microprocessors & Microsystems
Hi-index | 0.00 |
We analyse an impact of long-latency instructions, the family blocksize parameter, and the thread switch modifier on execution efficiency of families of threads in a single-core configuration of the UTLEON3 processor that implements the SVP microthreading model. The analysis is supported by code execution in an FPGA implementation of the processor. By classifying long-latency operations as either pipelined (e.g. floatingpoint operations) or non-pipelined (e.g. cache faults) we show that the blocksize parameter that controls resource utilization in the microthreaded processor has profound effects when the latency is pipelined, i.e. increasing the blocksize can improve the performance. In the nonpipelined long-latency case the efficiency reaches its maximum even with a small value of blocksize beyond which it cannot improve due to occupancy of an exclusive resource (memory bus congestion). The conclusions drawn in this paper can be used to optimize code compilation for the microthreaded processor. As the compiler specifies the blocksize parameter for each family of threads individually, it can optimize the register file utilization of the processor.