MASA: a multithreaded processor architecture for parallel symbolic computing
ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
Simultaneous multithreading: maximizing on-chip parallelism
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
The case for a single-chip multiprocessor
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Symbiotic jobscheduling with priorities for a simultaneous multithreading processor
SIGMETRICS '02 Proceedings of the 2002 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Network Processor Design: Issues and Practices
Network Processor Design: Issues and Practices
Observed structure of addresses in IP traffic
Proceedings of the 2nd ACM SIGCOMM Workshop on Internet measurment
A pipelined memory architecture for high throughput network processors
Proceedings of the 30th annual international symposium on Computer architecture
APRIL: A PROCESSOR ARCHITECTURE FOR MULTIPROCESSING
APRIL: A PROCESSOR ARCHITECTURE FOR MULTIPROCESSING
Single-ISA Heterogeneous Multi-Core Architectures for Multithreaded Workload Performance
Proceedings of the 31st annual international symposium on Computer architecture
Predicting Inter-Thread Cache Contention on a Chip Multi-Processor Architecture
HPCA '05 Proceedings of the 11th International Symposium on High-Performance Computer Architecture
A comparison of interactivity in the Linux 2.6 scheduler and an MLFQ scheduler
Software—Practice & Experience
Operating system scheduling for chip multithreaded processors
Operating system scheduling for chip multithreaded processors
Improving Performance Isolation on Chip Multiprocessors via an Operating System Scheduler
PACT '07 Proceedings of the 16th International Conference on Parallel Architecture and Compilation Techniques
Measuring Operating System Overhead on CMT Processors
SBAC-PAD '08 Proceedings of the 2008 20th International Symposium on Computer Architecture and High Performance Computing
HASS: a scheduler for heterogeneous multicore systems
ACM SIGOPS Operating Systems Review
Thread to strand binding of parallel network applications in massive multi-threaded systems
Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
Cache index-aware memory allocation
Proceedings of the international symposium on Memory management
ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
Optimal task assignment in multithreaded processors: a statistical approach
ASPLOS XVII Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems
Assessing the suitability of the NGMP multi-core processor in the space domain
Proceedings of the tenth ACM international conference on Embedded software
L1-bandwidth aware thread allocation in multicore SMT processors
PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
Hi-index | 0.00 |
Thread level parallelism (TLP) has become a popular trend to improve processor performance, overcoming the limitations of extracting instruction level parallelism. Each TLP paradigm, such as Simultaneous Multithreading or Chip-Multiprocessors, provides different benefits, which has motivated processor vendors to combine several TLP paradigms in each chip design. Even if most of these combined-TLP designs are homogeneous, they present different levels of hardware resource sharing, which introduces complexities on the operating system scheduling and load balancing. Commonly, processor designs provide two levels of resource sharing: Inter-core in which only the highest levels of the cache hierarchy are shared, and Intra-core in which most of the hardware resources of the core are shared. Recently, Sun Microsystems has released the UltraSPARC T2, a processor with three levels of hardware resource sharing: InterCore, IntraCore, and IntraPipe. In this work, we provide the first characterization of a three-level resource sharing processor, the UltraSPARC T2, and we show how multi-level resource sharing affects the operating system design. We further identify the most critical hardware resources in the T2 and the characteristics of applications that are not sensitive to resource sharing. Finally, we present a case study in which we run a real multithreaded network application, showing that a resource sharing aware scheduler can improve the system throughput up to 55%.