Compile-time partitioning and scheduling of parallel programs
SIGPLAN '86 Proceedings of the 1986 SIGPLAN symposium on Compiler construction
ACM '87 Proceedings of the 1987 Fall Joint Computer Conference on Exploring technology: today and tomorrow
A design study of the EARTH multiprocessor
PACT '95 Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques
The implementation of the Cilk-5 multithreaded language
PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
EROS: a fast capability system
Proceedings of the seventeenth ACM symposium on Operating systems principles
Programming semantics for multiprogrammed computations
Communications of the ACM
Inside the as/400
Capability-Based Computer Systems
Capability-Based Computer Systems
Rapid Hardware Prototyping on RPM-2
IEEE Design & Test
Fresh Breeze: a multiprocessor chip architecture guided by modular programming principles
ACM SIGARCH Computer Architecture News
Superconducting Processors for HTMT: Issues and Challenges
FRONTIERS '99 Proceedings of the The 7th Symposium on the Frontiers of Massively Parallel Computation
A Parallel Program Execution Model Supporting Modular Software Construction
MPPM '97 Proceedings of the Conference on Massively Parallel Programming Models
Earth: an efficient architecture for running threads
Earth: an efficient architecture for running threads
SOSP '69 Proceedings of the second symposium on Operating systems principles
TiNy Threads: A Thread Virtual Machine for the Cyclops64 Cellular Architecture
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 14 - Volume 15
X10: an object-oriented approach to non-uniform cluster computing
OOPSLA '05 Proceedings of the 20th annual ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
The Cambridge CAP computer and its operating system (Operating and programming systems series)
The Cambridge CAP computer and its operating system (Operating and programming systems series)
Phasers: a unified deadlock-free construct for collective and point-to-point synchronization
Proceedings of the 22nd annual international conference on Supercomputing
Work-first and help-first scheduling policies for async-finish task parallelism
IPDPS '09 Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing
Comparing the usability of library vs. language approaches to task parallelism
Evaluation and Usability of Programming Languages and Tools
EDA in IBM: past, present, and future
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Compiling Fresh Breeze Codelets
Proceedings of Programming Models and Applications on Multicores and Manycores
Hi-index | 0.00 |
The Fresh Breeze memory model and system architecture is proposed as an approach to achieving significant improvements in massively parallel computation by supporting fine-grain management of memory and processing resources and utilizing a global shared name space for all processors and computation tasks. Memory management and the scheduling of tasks are done by hardware realizations, eliminating nearly all operating system execution cycles for data access, task scheduling and security. In particular, the Fresh Breeze memory model uses trees of fixed-size chunks of memory to represent all data objects, which eliminates data consistency issues and simplifies memory management. Low-cost reference-count garbage collection is used to support modular programming in type-safe programming languages.The main contributions of this paper are: (1) a program exection model for massively parallel computing as the Fresh Breeze application programming interface (API) comprising a radical memory model and a scheme for expressing concurrency; (2) an experimental implementation of the API through simulation using the FAST simulator of the IBM Cyclops 64 many-core chip; (3) simulation results that demonstrate that (a) fine-grain hardware-implemented resource management mechanisms can support massive parallelism and high processor utilization through the latency-hiding properties of multi-tasking; and (b) hardware implementation of a work stealing scheme incorporated in our simulation can effectively distribute tasks over the processors of a many-core parallel computer.