Implementing Fine/Medium Grained TLP Support in a Many-Core Architecture

Authors:
Roberto Giorgi;Zdravko Popovic;Nikola Puzovic
Affiliations:
Department of Information Engineering, University of Siena, Italy;Department of Information Engineering, University of Siena, Italy;Department of Information Engineering, University of Siena, Italy
Venue:
SAMOS '09 Proceedings of the 9th International Workshop on Embedded Computer Systems: Architectures, Modeling, and Simulation
Year:
2009

Citing 11
Cited 1

TAM—a compiler controlled threaded abstract machine

Journal of Parallel and Distributed Computing - Special issue on dataflow and multithreaded architectures
Scheduled Dataflow: Execution Paradigm, Architecture, and Performance Evaluation

IEEE Transactions on Computers - Special issue on the parallel architecture and compilation techniques conference
Dissecting Cyclops: a detailed analysis of a multithreaded architecture

ACM SIGARCH Computer Architecture News
Exploiting ILP, TLP, and DLP with the polymorphous TRIPS architecture

Proceedings of the 30th annual international symposium on Computer architecture
MiBench: A free, commercially representative embedded benchmark suite

WWC '01 Proceedings of the Workload Characterization, 2001. WWC-4. 2001 IEEE International Workshop
Introduction to the cell multiprocessor

IBM Journal of Research and Development - POWER5 and packaging
Data-Driven Multithreading Using Conventional Microprocessors

IEEE Transactions on Parallel and Distributed Systems
CAPSULE: Hardware-Assisted Parallel Execution of Component-Based Programs

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Transactional Memory: An Overview

IEEE Micro
UNISIM: An Open Simulation Environment and Library for Complex Architecture Design and Collaborative Development

IEEE Computer Architecture Letters
Exploiting DMA to enable non-blocking execution in Decoupled Threaded Architecture

IPDPS '09 Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing

Simulating the future kilo-x86-64 core processors and their infrastructure

Proceedings of the 45th Annual Simulation Symposium

Quantified Score

Hi-index	0.00

Visualization

Abstract

We believe that future many-core architectures should support a simple and scalable way to execute many threads that are generated by parallel programs. A good candidate to implement an efficient and scalable execution of threads is the DTA (Decoupled Threaded Architecture), which is designed to exploit fine/medium grained Thread Level Parallelism (TLP) by using a hardware scheduling unit and relying on existing simple cores. In this paper, we present an initial implementation of DTA concept in a many-core architecture where it interacts with other architectural components designed from scratch in order to address the problem of scalability. We present initial results that show the scalability of the solution that were obtained using a many-core simulator written in SARCSim (a variant of UNISIM) with DTA support.