FFTs in external or hierarchical memory
The Journal of Supercomputing
The performance of multiprogrammed multiprocessor scheduling algorithms
SIGMETRICS '90 Proceedings of the 1990 ACM SIGMETRICS conference on Measurement and modeling of computer systems
A comparison of sorting algorithms for the connection machine CM-2
SPAA '91 Proceedings of the third annual ACM symposium on Parallel algorithms and architectures
The SGI Origin: a ccNUMA highly scalable server
Proceedings of the 24th annual international symposium on Computer architecture
Getting to the bottom of deep submicron
Proceedings of the 1998 IEEE/ACM international conference on Computer-aided design
Route packets, not wires: on-chip inteconnection networks
Proceedings of the 38th annual Design Automation Conference
IEEE Transactions on Parallel and Distributed Systems
A Two-step Genetic Algorithm for Mapping Task Graphs to a Network on Chip Architecture
DSD '03 Proceedings of the Euromicro Symposium on Digital Systems Design
Proceedings of the conference on Design, automation and test in Europe - Volume 1
Dynamic Critical Path Scheduling Parallel Programs onto Multiprocessors
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 8 - Volume 09
An architectural co-synthesis algorithm for energy-aware Network-on-Chip design
Journal of Systems Architecture: the EUROMICRO Journal
Hi-index | 0.00 |
In this paper, we propose a greedy heuristic approximation scheduling algorithm for future multicore processors. It is expected that hundreds of cores will be integrated on a single chip, known as a Chip Multiprocessor (CMP). To reduce on-chip communication delay, 3D integration with Through Silicon Vias (TSVs) is introduced to replace the 2D counterpart. Multiple functional layers can be stacked in a 3D CMP. However, operating system process scheduling, one of the most important design issues for CMP systems, has not been well addressed for such a system. We define a model for future 3D CMPs, based on which a scheduling algorithm is proposed to reduce cache access latencies and the delay of inter process communications (IPC). We explore different scheduling possibilities and discuss the advantages and disadvantages of our algorithm. We present benchmark results using a cycle accurate full system simulator based on realistic workloads. Experiments show that under two workloads, the execution times of our scheduling in two configurations (2 and 4 threads) are reduced by 15.58% and 8.13% respectively, compared with the other schedulings. Our study provides a guideline for designing scheduling algorithms for 3D multicore processors.