FFTs in external or hierarchical memory
The Journal of Supercomputing
The performance of multiprogrammed multiprocessor scheduling algorithms
SIGMETRICS '90 Proceedings of the 1990 ACM SIGMETRICS conference on Measurement and modeling of computer systems
The performance advantages of integrating block data transfer in cache-coherent multiprocessors
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
The SGI Origin: a ccNUMA highly scalable server
Proceedings of the 24th annual international symposium on Computer architecture
IEEE Transactions on Parallel and Distributed Systems
Memory-Intensive Benchmarks: IRAM vs. Cache-Based Machines
IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
A Two-step Genetic Algorithm for Mapping Task Graphs to a Network on Chip Architecture
DSD '03 Proceedings of the Euromicro Symposium on Digital Systems Design
Proceedings of the conference on Design, automation and test in Europe - Volume 1
An architectural co-synthesis algorithm for energy-aware Network-on-Chip design
Journal of Systems Architecture: the EUROMICRO Journal
Achieving predictable performance through better memory controller placement in many-core CMPs
Proceedings of the 36th annual international symposium on Computer architecture
Hi-index | 0.00 |
In this paper, we study and analyze process scheduling problems for future multicore processors. It is expected that hundreds or even thousands of cores will be integrated on a single chip, known as a Chip Multiprocessor (CMP). However, operating system process scheduling, one of the most important design issues for CMP systems, has not been well addressed. We define a model for future CMPs, based on which a scheduling algorithm is proposed to reduce on-chip communication latencies and improve performance. The impact of memory access and inter process communication (IPC) in scheduling are analyzed. We explore six typical core allocation strategies. Results show that, a strategy with a balanced consideration of both IPC and memory access out-performs other strategies, the two metrics (misses per thousand instructions and cache hit latencies) are reduced up to 25.97% and 13.11%, respectively.