I-structures: data structures for parallel computing
ACM Transactions on Programming Languages and Systems (TOPLAS)
Transactional memory: architectural support for lock-free data structures
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
The anatomy of the register file in a multiscalar processor
MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Exploiting fine-grain thread level parallelism on the MIT multi-ALU processor
Proceedings of the 25th annual international symposium on Computer architecture
Gilgamesh: a multithreaded processor-in-memory architecture for petaflops computing
Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Caching Single-Assignment Structures to Build a Robust Fine-Grain Multi-Threading System
IPDPS '00 Proceedings of the 14th International Symposium on Parallel and Distributed Processing
Transactional Memory Coherence and Consistency
Proceedings of the 31st annual international symposium on Computer architecture
Heterogeneous Chip Multiprocessors
Computer
ISPAN '05 Proceedings of the 8th International Symposium on Parallel Architectures,Algorithms and Networks
An efficient synchronization technique for multiprocessor systems on-chip
MEDEA '05 Proceedings of the 2005 workshop on MEmory performance: DEaling with Applications , systems and architecture
Computer Architecture, Fourth Edition: A Quantitative Approach
Computer Architecture, Fourth Edition: A Quantitative Approach
Proceedings of the 34th annual international symposium on Computer architecture
A processor-in-memory architecture for multimedia compression
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
nuKernel: MicroKernel for multi-core DSP SoCs with load sharing and priority interrupts
Proceedings of the 28th Annual ACM Symposium on Applied Computing
Parallelized sub-resource loading for web rendering engine
Journal of Systems Architecture: the EUROMICRO Journal
Hi-index | 0.00 |
While the semiconductor industry has provided us with powerful systems for personal supercomputing, how to efficiently harness the computing power of these systems still remains a major unsolved problem. This challenge must be approached by simultaneously solving the synchronization problem and the parallel programmability problem. This paper reviews the synchronization issues in modern parallel computer architectures, surveys the state of the art approaches used to alleviate these problems, and proposes our Request-Store-Forward (RSF) model of synchronization. This model splits the atomic synchronization operations into two phases, thus freeing the processing elements from polling operations. Finally, we show how we could learn from nature and improve the overall system performance by closely coupling peripheral computing units and functional units.