Mach and Matchmaker: kernel and language support for object-oriented distributed systems
OOPLSA '86 Conference proceedings on Object-oriented programming systems, languages and applications
The Sprite Network Operating System
Computer
The Performance Implications of Thread Management Alternatives for Shared-Memory Multiprocessors
IEEE Transactions on Computers
Algorithms for scalable synchronization on shared-memory multiprocessors
ACM Transactions on Computer Systems (TOCS)
Compiling with continuations
Scheduler activations: effective kernel support for the user-level management of parallelism
ACM Transactions on Computer Systems (TOCS)
Transactional memory: architectural support for lock-free data structures
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Cilk: an efficient multithreaded runtime system
PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
Exokernel: an operating system architecture for application-level resource management
SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
Extensibility safety and performance in the SPIN operating system
SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
The Flux OSKit: a substrate for kernel and language research
Proceedings of the sixteenth ACM symposium on Operating systems principles
Disco: running commodity operating systems on scalable multiprocessors
Proceedings of the sixteenth ACM symposium on Operating systems principles
Multithreaded programming with Pthreads
Multithreaded programming with Pthreads
Memory allocation for long-running server applications
Proceedings of the 1st international symposium on Memory management
Functional divisions in the Piglet multiprocessor operating system
Proceedings of the 8th ACM SIGOPS European workshop on Support for composing distributed applications
First-class user-level threads
SOSP '91 Proceedings of the thirteenth ACM symposium on Operating systems principles
Scalable queue-based spin locks with timeout
PPoPP '01 Proceedings of the eighth ACM SIGPLAN symposium on Principles and practices of parallel programming
SEDA: an architecture for well-conditioned, scalable internet services
SOSP '01 Proceedings of the eighteenth ACM symposium on Operating systems principles
OpenMP on networks of workstations
SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
Profile-directed optimization of event-based programs
PLDI '02 Proceedings of the ACM SIGPLAN 2002 Conference on Programming language design and implementation
Queue Locks on Cache Coherent Multiprocessors
Proceedings of the 8th International Symposium on Parallel Processing
Xen and the art of virtualization
SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
Capriccio: scalable threads for internet services
SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
Best of Both Latency and Throughput
ICCD '04 Proceedings of the IEEE International Conference on Computer Design
Concurrency and Computation: Practice & Experience - 2002 ACM Java Grande—ISCOPE Conference Part I
McRT-STM: a high performance software transactional memory system for a multi-core runtime
Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming
The java.util.concurrent synchronizer framework
Science of Computer Programming - Special issue: Concurrency and synchronization in Java programs
McRT-Malloc: a scalable transactional memory allocator
Proceedings of the 5th international symposium on Memory management
Compiler and runtime support for efficient software transactional memory
Proceedings of the 2006 ACM SIGPLAN conference on Programming language design and implementation
MapReduce: simplified data processing on large clusters
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Flash: an efficient and portable web server
ATEC '99 Proceedings of the annual conference on USENIX Annual Technical Conference
Towards effective user-controlled scheduling for microkernel-based systems
ACM SIGOPS Operating Systems Review
Streamware: programming general-purpose multicore processors using streams
Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Thread scheduling for multi-core platforms
HOTOS'07 Proceedings of the 11th USENIX workshop on Hot topics in operating systems
Pillar: A Parallel Implementation Language
Languages and Compilers for Parallel Computing
Programming model for a heterogeneous x86 platform
Proceedings of the 2009 ACM SIGPLAN conference on Programming language design and implementation
Adapting application execution in CMPs using helper threads
Journal of Parallel and Distributed Computing
Exploiting fine-grain thread parallelism on multicore architectures
Scientific Programming - Software Development for Multi-core Computing Systems
Mining tree-structured data on multicore systems
Proceedings of the VLDB Endowment
Hera-JVM: abstracting processor heterogeneity behind a virtual machine
HotOS'09 Proceedings of the 12th conference on Hot topics in operating systems
Tessellation: space-time partitioning in a manycore client OS
HotPar'09 Proceedings of the First USENIX conference on Hot topics in parallelism
Corey: an operating system for many cores
OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation
Design principles for end-to-end multicore schedulers
HotPar'10 Proceedings of the 2nd USENIX conference on Hot topics in parallelism
Hera-JVM: a runtime system for heterogeneous multi-core architectures
Proceedings of the ACM international conference on Object oriented programming systems languages and applications
Comparing scalability prediction strategies on an SMP of CMPs
EuroPar'10 Proceedings of the 16th international Euro-Par conference on Parallel processing: Part I
TM2C: a software transactional memory for many-cores
Proceedings of the 7th ACM european conference on Computer Systems
BWS: balanced work stealing for time-sharing multicores
Proceedings of the 7th ACM european conference on Computer Systems
Server-based scheduling of parallel real-time tasks
Proceedings of the tenth ACM international conference on Embedded software
Proceedings of the ACM international conference on Object oriented programming systems languages and applications
A lightweight VMM on many core for high performance computing
Proceedings of the 9th ACM SIGPLAN/SIGOPS international conference on Virtual execution environments
Tessellation: refactoring the OS around explicit resource containers with continuous adaptation
Proceedings of the 50th Annual Design Automation Conference
HARS: A hardware-assisted runtime software for embedded many-core architectures
ACM Transactions on Embedded Computing Systems (TECS) - Special Issue on Design Challenges for Many-Core Processors, Special Section on ESTIMedia'13 and Regular Papers
Hi-index | 0.00 |
Hardware trends suggest that large-scale CMP architectures, with tens to hundreds of processing cores on a single piece of silicon, are iminent within the next decade. While existing CMP machines have traditionally been handled in the same way as SMPs, this magnitude of parallelism introduces several fundamental challenges at the architectural level and this, in turn, translates to novel challenges in the design of the software stack for these platforms. This paper presents the "Many Core Run Time" (McRT), a software prototype of an integrated language runtime that was designed to explore configurations of the software stack for enabling performance and scalability on large scale CMP platforms. This paper presents the architecture of McRT and discusses our experiences with the system, including experimental evaluation that lead to several interesting, non-intuitive findings, providing key insights about the structure of the system stack at this scale. A key contribution of this paper is to demonstrate how McRT enables near linear improvements in performance and scalability for desktop workloads such as the popular XviD encoder and a set of RMS (recognition, mining, and synthesis) applications. Another key contribution of this work is its use of McRT to explore non-traditional system configurations such as a light-weight executive in which McRT runs on "bare metal" and replaces the traditional OS. Such configurations are becoming an increasingly attractive alternative to leverage heterogeneous computing uints as seen in today's CPU-GPU configurations.