MULTILISP: a language for concurrent symbolic computation
ACM Transactions on Programming Languages and Systems (TOPLAS)
The Stanford Dash Multiprocessor
Computer
Cilk: an efficient multithreaded runtime system
PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Efficient load balancing for wide-area divide-and-conquer applications
PPoPP '01 Proceedings of the eighth ACM SIGPLAN symposium on Principles and practices of parallel programming
Strategies for Dynamic Load Balancing on Highly Parallel Computers
IEEE Transactions on Parallel and Distributed Systems
Executing functional programs on a virtual tree of processors
FPCA '81 Proceedings of the 1981 conference on Functional programming languages and computer architecture
A Network on Chip Architecture and Design Methodology
ISVLSI '02 Proceedings of the IEEE Computer Society Annual Symposium on VLSI
Cilk: efficient multithreaded computing
Cilk: efficient multithreaded computing
The need for adaptive dynamic thread scheduling
High performance scientific and engineering computing
Transactional Memory Coherence and Consistency
Proceedings of the 31st annual international symposium on Computer architecture
Low-Latency Virtual-Channel Routers for On-Chip Networks
Proceedings of the 31st annual international symposium on Computer architecture
Proceedings of the conference on Design, Automation and Test in Europe - Volume 2
Æthereal Network on Chip: Concepts, Architectures, and Implementations
IEEE Design & Test
Hardware-modulated parallelism in chip multiprocessors
ACM SIGARCH Computer Architecture News - Special issue: dasCMP'05
CAPSULE: Hardware-Assisted Parallel Execution of Component-Based Programs
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Carbon: architectural support for fine-grained parallelism on chip multiprocessors
Proceedings of the 34th annual international symposium on Computer architecture
IEEE Computer Architecture Letters
Intel® threading building blocks
Journal of Computing Sciences in Colleges
Proceedings of the conference on Design, automation and test in Europe
The PARSEC benchmark suite: characterization and architectural implications
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Quality-of-service and error control techniques for mesh-based network-on-chip architectures
Integration, the VLSI Journal - Special issue: ACM great lakes symposium on VLSI
A survey and comparison of wormhole routing techniques in a mesh networks
IEEE Network: The Magazine of Global Internetworking
Hi-index | 0.00 |
Parallel programming approaches based on task division/spawning are getting increasingly popular because they provide for a simple and elegant abstraction of parallelization, while achieving good performance on workloads which are traditionally complex to parallelize due to the complex control flow and data structures involved. The ability to quickly distribute fine-granularity tasks among many cores is key to the efficiency and scalability of such division-based parallel programming approaches. For this reason, several hardware supports for work stealing environments have already been proposed. However, they all rely on a central hardware structure for distributing tasks among cores, which hampers the scalability and efficiency of these schemes. In this paper, we focus on conditional division, a division-based parallel approach which provides the additional benefit, over work-stealing approaches, of releasing the user from dealing with task granularity and which does not clog hardware resources with an exceedingly large number of small tasks. For this type of division-based approaches, we show that it is possible to design hardware support for speeding up task division that entirely relies on local information, and which thus exhibits good scalability properties.