The duality of memory and communication in the implementation of a multiprocessor operating system
SOSP '87 Proceedings of the eleventh ACM Symposium on Operating systems principles
Simple but effective techniques for NUMA memory management
SOSP '89 Proceedings of the twelfth ACM symposium on Operating systems principles
Managing pages in shared virtual memory systems: getting the compiler into the game
ICS '93 Proceedings of the 7th international conference on Supercomputing
Cache-conscious structure layout
Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
Cache-conscious structure definition
Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
The MuNet: A scalable decentralized architecture for parallel computation
ISCA '80 Proceedings of the 7th annual symposium on Computer Architecture
MapReduce: simplified data processing on large clusters
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Dryad: distributed data-parallel programs from sequential building blocks
Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
Scalable Parallel Programming with CUDA
Queue - GPU Computing
The multikernel: a new OS architecture for scalable multicore systems
Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles
OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation
The von Neumann architecture is due for retirement
HotOS'13 Proceedings of the 14th USENIX conference on Hot Topics in Operating Systems
Hi-index | 0.00 |
Modern computing substrates like cloud computing clusters, massively multi-core processors, and general-purpose GPUs offer a wealth of computing power with the caveat that programmers must carefully structure their programs to fit within various hardware and communication limits in order to get high performance. Unfortunately, modern abstractions expressly hide machine structure from the program and program structure from the machine, hindering automatic optimization. We introduce an alternative computing abstraction--a linked graph of finite-sized memory chunks--that explicitly exposes the size and structure of programs to the operating system. The chunk graph enables the operating system to use size and structure to optimize how programs are mapped onto complex machine structure.