TAM—a compiler controlled threaded abstract machine
Journal of Parallel and Distributed Computing - Special issue on dataflow and multithreaded architectures
The SPLASH-2 programs: characterization and methodological considerations
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Scope consistency: a bridge between release consistency and entry consistency
Proceedings of the eighth annual ACM symposium on Parallel algorithms and architectures
The implementation of the Cilk-5 multithreaded language
PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
Earth: an efficient architecture for running threads
Earth: an efficient architecture for running threads
TiNy Threads: A Thread Virtual Machine for the Cyclops64 Cellular Architecture
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 14 - Volume 15
Hardware-modulated parallelism in chip multiprocessors
ACM SIGARCH Computer Architecture News - Special issue: dasCMP'05
CAPSULE: Hardware-Assisted Parallel Execution of Component-Based Programs
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Carbon: architectural support for fine-grained parallelism on chip multiprocessors
Proceedings of the 34th annual international symposium on Computer architecture
Hi-index | 0.00 |
This paper presents the design and implementation of a runtime system (named “GodRunner”) on Godson-T many-core processor to support task-level parallelism efficiently and flexibly. GodRunner abstracts underlying hardware resource, providing ease-of-use programming interface. A two-grade task management mechanism is proposed to support both coarse-grained and fine-grained multithreading efficiently. Two load-balanced scheduling policies are combined flexibly in GodRunner. The software-controlled task management makes GodRunner more configurable and extensible than hard-wired ones. The experiment shows that the tasking overhead in GodRunner is as small as hundreds of cycles, which is about the hundreds of times faster than the conventional Pthread based multithreading on a SMP machine. Furthermore, our approach scales well and supports fine-grained tasks as small as 20k cycles optimally.