on Parallel MIMD computation: HEP supercomputer and its applications
Compiler algorithms for synchronization
IEEE Transactions on Computers
I-structures: data structures for parallel computing
ACM Transactions on Programming Languages and Systems (TOPLAS)
Algorithms for scalable synchronization on shared-memory multiprocessors
ACM Transactions on Computer Systems (TOCS)
Experience with fine-grain synchronization in MIMD machines for preconditioned conjugate gradient
PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
Transactional memory: architectural support for lock-free data structures
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Synchronization minimization in a SPMD execution model
Journal of Parallel and Distributed Computing - Special issue on distributed shared memory systems
Compiler optimizations for parallel loops with fine-grained synchronization
Compiler optimizations for parallel loops with fine-grained synchronization
Design of cache memories for multi-threaded dataflow architecture
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
ICS '90 Proceedings of the 4th international conference on Supercomputing
Efficient synchronization: let them eat QOLB
Proceedings of the 24th annual international symposium on Computer architecture
The SGI Origin: a ccNUMA highly scalable server
Proceedings of the 24th annual international symposium on Computer architecture
Exploiting fine-grain thread level parallelism on the MIT multi-ALU processor
Proceedings of the 25th annual international symposium on Computer architecture
High performance dynamic lock-free hash tables and list-based sets
Proceedings of the fourteenth annual ACM symposium on Parallel algorithms and architectures
Transactional lock-free execution of lock-based programs
Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Supporting Fine-Grained Synchronization on a Simultaneous Multithreading Processor
HPCA '99 Proceedings of the 5th International Symposium on High Performance Computer Architecture
Optimally Synchronizing DOACROSS Loops on Shared Memory Multiprocessors
PACT '97 Proceedings of the 1997 International Conference on Parallel Architectures and Compilation Techniques
Evaluation of a Multithreaded Architecture for Cellular Computing
HPCA '02 Proceedings of the 8th International Symposium on High-Performance Computer Architecture
LOW-COST SUPPORT FOR FINE-GRAIN SYNCHRONIZATION IN MULTIPROCESSORS
LOW-COST SUPPORT FOR FINE-GRAIN SYNCHRONIZATION IN MULTIPROCESSORS
Hazard Pointers: Safe Memory Reclamation for Lock-Free Objects
IEEE Transactions on Parallel and Distributed Systems
Proceedings of the 2nd conference on Computing frontiers
Virtualizing Transactional Memory
Proceedings of the 32nd annual international symposium on Computer Architecture
Toward a Software Infrastructure for the Cyclops-64 Cellular Architecture
HPCS '06 Proceedings of the 20th International Symposium on High-Performance Computing in an Advanced Collaborative Environment
Architectural Semantics for Practical Transactional Memory
Proceedings of the 33rd annual international symposium on Computer Architecture
Lightweight lock-free synchronization methods for multithreading
Proceedings of the 20th annual international conference on Supercomputing
Exploiting Fine-Grained Data Parallelism with Chip Multiprocessors and Fast Barriers
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
A parallel dynamic programming algorithm on a multi-core architecture
Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures
Optimization of dense matrix multiplication on IBM cyclops-64: challenges and experiences
Euro-Par'06 Proceedings of the 12th international conference on Parallel Processing
A Performance Model of Dense Matrix Operations on Many-Core Architectures
Euro-Par '08 Proceedings of the 14th international Euro-Par conference on Parallel Processing
Minimum Lock Assignment: A Method for Exploiting Concurrency among Critical Sections
Languages and Compilers for Parallel Computing
Languages and Compilers for Parallel Computing
Techniques for efficient placement of synchronization primitives
Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
Architectural support for cilk computations on many-core architectures
Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
Mapping the LU decomposition on a many-core architecture: challenges and solutions
Proceedings of the 6th ACM conference on Computing frontiers
Synchronization optimizations for efficient execution on multi-cores
Proceedings of the 23rd international conference on Supercomputing
High Performance Matrix Multiplication on Many Cores
Euro-Par '09 Proceedings of the 15th International Euro-Par Conference on Parallel Processing
HPP controller: a system controller for high performance computing
Frontiers of Computer Science in China
Architectural Support for Fair Reader-Writer Locking
MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Low-cost and energy-efficient distributed synchronization for embedded multiprocessors
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Landing stencil code on Godson-T
Journal of Computer Science and Technology
Analysis and performance results of computing betweenness centrality on IBM Cyclops64
The Journal of Supercomputing
Proceedings of the international conference on Supercomputing
Low-Overhead, high-speed multi-core barrier synchronization
HiPEAC'10 Proceedings of the 5th international conference on High Performance Embedded Architectures and Compilers
SuperCoP: a general, correct, and performance-efficient supervised memory system
Proceedings of the 9th conference on Computing Frontiers
Synchronization mechanisms on modern multi-core architectures
ACSAC'07 Proceedings of the 12th Asia-Pacific conference on Advances in Computer Systems Architecture
HARS: A hardware-assisted runtime software for embedded many-core architectures
ACM Transactions on Embedded Computing Systems (TECS) - Special Issue on Design Challenges for Many-Core Processors, Special Section on ESTIMedia'13 and Regular Papers
Hi-index | 0.01 |
Efficient fine-grain synchronization is extremely important to effectively harness the computational power of many-core architectures. However, designing and implementing finegrain synchronization in such architectures presents several challenges, including issues of synchronization induced overhead, storage cost, scalability, and the level of granularity to which synchronization is applicable. This paper proposes the Synchronization State Buffer (SSB), a scalable architectural design for fine-grain synchronization that efficiently performs synchronizations between concurrent threads. The design of SSB is motivated by the following observation: at any instance during the parallel execution only a small fraction of memory locations are actively participating in synchronization. Based on this observation we present a fine-grain synchronization design that records and manages the states of frequently synchronized data using modest hardware support. We have implemented the SSB design in the context of the 160-core IBM Cyclops-64 architecture. Using detailed simulation, we present our experience for a set of benchmarks with different workload characteristics.