Programming a Multicore Architecture without Coherency and Atomic Operations
Proceedings of Programming Models and Applications on Multicores and Manycores
Hi-index | 0.00 |
Real-time embedded systems like smartphones tend to comprise an ever increasing number of processing cores. For scalability and the need for guaranteed performance, the use of a connection-oriented network-on-chip (NoC) is advocated. Furthermore, a distributed shared memory architecture is preferred as it simplifies software development for a multicore system. In this paper, experimental evidence is provided, showing that replacing a connection-oriented NoC by a connectionless one in a distributed shared memory system reduces the hardware costs and improves the performance. We observed that our FPGA could only support an 8-core system with a connection-oriented NoC. We exchanged the NoC with our tree-shaped, connectionless network and a ring, allowing a 32-core system in the same FPGA, mainly because of a reduced number of physical connections. Although the analytical worst-case performance slightly decreased, measurements show that the latency of latency-critical memory reads was reduced by 52% on average.