TAM—a compiler controlled threaded abstract machine
Journal of Parallel and Distributed Computing - Special issue on dataflow and multithreaded architectures
The implementation of the Cilk-5 multithreaded language
PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
Demonstrating the Scalability of a Molecular Dynamics Application on a Petaflops Computer
International Journal of Parallel Programming
Platform-Independent Runtime Optimizations Using OpenThreads
IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
Threads for Interoperable Parallel Programming
LCPC '96 Proceedings of the 9th International Workshop on Languages and Compilers for Parallel Computing
Dissecting Cyclops: a detailed analysis of a multithreaded architecture
ACM SIGARCH Computer Architecture News
Microarchitectural exploration with Liberty
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Evaluation of a Multithreaded Architecture for Cellular Computing
HPCA '02 Proceedings of the 8th International Symposium on High-Performance Computer Architecture
Earth: an efficient architecture for running threads
Earth: an efficient architecture for running threads
MicroLib: A Case for the Quantitative Comparison of Micro-Architecture Mechanisms
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
A parallel dynamic programming algorithm on a multi-core architecture
Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures
Hierarchical memory system design for a heterogeneous multi-core processor
Proceedings of the 2008 ACM symposium on Applied computing
Languages and Compilers for Parallel Computing
Exploiting fine-grain thread parallelism on multicore architectures
Scientific Programming - Software Development for Multi-core Computing Systems
Performance characteristics of OpenMP language constructs on a many-core-on-a-chip architecture
IWOMP'05/IWOMP'06 Proceedings of the 2005 and 2006 international conference on OpenMP shared memory parallel programming
Hierarchical multithreading: programming model and system software
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
A study of the on-chip interconnection network for the IBM Cyclops64 multi-core architecture
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Analysis and performance results of computing betweenness centrality on IBM Cyclops64
The Journal of Supercomputing
Proceedings of the international conference on Supercomputing
Experiments with the Fresh Breeze tree-based memory model
Computer Science - Research and Development
Hardware and software tradeoffs for task synchronization on manycore architectures
Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part II
The challenges of efficient code-generation for massively parallel architectures
ACSAC'06 Proceedings of the 11th Asia-Pacific conference on Advances in Computer Systems Architecture
Performance modelling and optimization of memory access on cellular computer architecture cyclops64
NPC'05 Proceedings of the 2005 IFIP international conference on Network and Parallel Computing
Tying memory management to parallel programming models
Euro-Par'06 Proceedings of the 12th international conference on Parallel Processing
Massively parallel breadth first search using a tree-structured memory model
Proceedings of the 2012 International Workshop on Programming Models and Applications for Multicores and Manycores
An efficient and flexible task management for many cores
Transactions on High-Performance Embedded Architectures and Compilers IV
An automatic code overlaying technique for multicores with explicitly-managed memory hierarchies
Proceedings of the Tenth International Symposium on Code Generation and Optimization
Hi-index | 0.00 |
This paper presents the design and implementation of a thread virtual machine, called TNT (or TiNy-Threads) for the IBM Cyclops64 architecture (the latest Cyclops architecture that employs a unique multiprocessor-on-a-chip design with a very large number of hardware thread units and embedded memory) - as the cornerstone of the C64 system software. We highlight how to achieve high efficiency by mapping (and matching) the TNT thread model directly to the Cyclops ISA features assisted by a native TNT thread runtime library. Major results of our experimental study demonstrate good efficiency, scalability and usability of our TNT model/implementation.