A bridging model for parallel computation
Communications of the ACM
Journal of Computer and System Sciences
An introduction to parallel algorithms
An introduction to parallel algorithms
Dynamic Perfect Hashing: Upper and Lower Bounds
SIAM Journal on Computing
ICS '90 Proceedings of the 4th international conference on Supercomputing
ACM Transactions on Programming Languages and Systems (TOPLAS)
Practical Pram Programming
Architectural differences of efficient sequential and parallel computers
Journal of Systems Architecture: the EUROMICRO Journal
Very Long Instruction Word architectures and the ELI-512
ISCA '83 Proceedings of the 10th annual international symposium on Computer architecture
Parallelism in random access machines
STOC '78 Proceedings of the tenth annual ACM symposium on Theory of computing
Computer Architecture: A Quantitative Approach
Computer Architecture: A Quantitative Approach
Networks on chip
Maximizing throughput over parallel wire structures in the deep submicrometer regime
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Preliminary analysis of feasible benchmark problems for the hydrid PRAM/NUMA REPLICA architecture
Proceedings of the 13th International Conference on Computer Systems and Technologies
Proceedings of the 14th International Conference on Computer Systems and Technologies
Hi-index | 0.00 |
Emulated shared memory (ESM) multiprocessor systems on chip (MP-SOC) and network on chip (NOC) regions are efficient general purpose computing engines for future computers and embedded systems running applications unknown at the design phase. While they provide programmer a synchronous, unified, and constant time accessible shared memory, the existing ESM architectures have been shown to be inefficient with workloads having low parallelism. In this paper we outline a configurable emulated shared memory (CESM) architecture that retains the advantages of the ESM architectures for parallel enough code but is also able to execute applications with low parallelism efficiently. This happens by allowing multiple threads to join as a single nonuniform memory access (NUMA) bunch and organizing memory system to support NUMA-like behavior for thread-local data if parallelism is limited. Performance simulations as well as silicon area and power consumption estimations of CESM MP-SOC/ NOC regions are provided.