Fpga-based prototype of a pram-on-chip processor
Proceedings of the 5th conference on Computing frontiers
An area-efficient high-throughput hybrid interconnection network for single-chip parallel processing
Proceedings of the 45th annual Design Automation Conference
A pilot study to compare programming effort for two parallel programming models
Journal of Systems and Software
Mesh-of-trees and alternative interconnection networks for single-chip parallelism
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
ICCD'09 Proceedings of the 2009 IEEE international conference on Computer design
A Low-Overhead Asynchronous Interconnection Network for GALS Chip Multiprocessors
NOCS '10 Proceedings of the 2010 Fourth ACM/IEEE International Symposium on Networks-on-Chip
Using simple abstraction to reinvent computing for parallelism
Communications of the ACM
A low-latency adaptive asynchronous interconnection network using bi-modal router nodes
NOCS '11 Proceedings of the Fifth ACM/IEEE International Symposium on Networks-on-Chip
OTIS-MOT: an efficient interconnection network for parallel processing
The Journal of Supercomputing
Better speedups using simpler parallel programming for graph connectivity and biconnectivity
Proceedings of the 2012 International Workshop on Programming Models and Applications for Multicores and Manycores
Thermal management of a many-core processor under fine-grained parallelism
Euro-Par'11 Proceedings of the 2011 international conference on Parallel Processing
Hi-index | 0.02 |
A Mesh of Trees (MoT) on-chip interconnection network has been proposed recently to provide high throughput be- tween memory units and processors for single-chip paral- lel processing [5]. In this paper, we report our findings in bringing this concept to silicon. Specifically, we conduct cycle-accurate verilog simulations to verify the analytical results claimed in [5]. We synthesize and obtain the layout of the MoT interconnection networks of various sizes. To further improve throughput, we investigate different arbitra- tion primitives to handle load and store, the two most com- mon memory operations. We also study the use of pipeline registers in large networks when there are long wires. Sim- ulation based on full network layout demonstrates that sig- nificant throughput improvement can be achieved over the original proposed MoT interconnection network. The importance of this work lies in its validation of per- formance features of the MoT interconnection network, as they were previously shown to be competitive with tradi- tional network solutions. The MoT network is currently used in an eXplicitMulti-Threading (XMT) on-chip parallel processor, which is engineered to support parallel program- ming. In that context, a 32-terminal MoT network could support up to 512 on-chip XMT processors. Our 8-terminal network that could serve 8 processor clusters (or 128 total processors), was also accepted recently for fabrication.