Architecture of a message-driven processor
ISCA '87 Proceedings of the 14th annual international symposium on Computer architecture
I-structures: data structures for parallel computing
ACM Transactions on Programming Languages and Systems (TOPLAS)
The J-machine multicomputer: an architectural evaluation
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Transactional memory: architectural support for lock-free data structures
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Proceedings of the 28th annual international symposium on Microarchitecture
ICS '90 Proceedings of the 4th international conference on Supercomputing
Monsoon: an explicit token-store architecture
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Proceedings of the International Sympoisum on Theoretical Programming
CEDAR: a large scale multiprocessor
ACM SIGARCH Computer Architecture News
TiNy Threads: A Thread Virtual Machine for the Cyclops64 Cellular Architecture
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 14 - Volume 15
Proceedings of the 2nd conference on Computing frontiers
The Atomos transactional programming language
Proceedings of the 2006 ACM SIGPLAN conference on Programming language design and implementation
Toward a Software Infrastructure for the Cyclops-64 Cellular Architecture
HPCS '06 Proceedings of the 20th International Symposium on High-Performance Computing in an Advanced Collaborative Environment
Lightweight lock-free synchronization methods for multithreading
Proceedings of the 20th annual international conference on Supercomputing
Proceedings of the 34th annual international symposium on Computer architecture
A study of the on-chip interconnection network for the IBM Cyclops64 multi-core architecture
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Proceedings of the 19th ACM/SIGDA international symposium on Field programmable gate arrays
Hi-index | 0.00 |
The Cray XMT architecture has incited curiosity among computer architect and system software designers for its architecture support of fine-grain in-memory synchronization. Although such discussion go back thirty years, there is a lack of practical experimental platforms that can evaluate major technological trends, such as fine-grain in-memory synchronization. The need for these platforms becomes apparent when dealing with new massive many-core designs and applications. This paper studies the feasibility, usefulness and trade-offs of fine-grain in-memory synchronization support in a real-world large-scale many-core chip (IBM Cyclops-64). We extended the original Cyclops-64 architecture design at gate level to support the fine-grain in-memory synchronization feature. We performed an in-depth study of a well-known kernel code: the wavefront computation. Several versions of the kernel were used to test the effects of different synchronization constructs using our chip emulation framework. Furthermore, we tested selected OpenMP kernel loops against existing software-based synchronization approaches. In our wavefront benchmark study, the combination of fine-grain dataflow-like in-memory synchronization with non-strict scheduling methods yields a thirty percent improvement over the best optimized traditional synchronization method provided by the original Cyclops-64 design. For the OpenMP kernel loops, we achieved speeds of three to fourteen times the speed of software-based synchronization methods.