Optimal Partitioning of Cache Memory
IEEE Transactions on Computers
Simultaneous multithreading: maximizing on-chip parallelism
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Computer architecture (2nd ed.): a quantitative approach
Computer architecture (2nd ed.): a quantitative approach
A Chip-Multiprocessor Architecture with Speculative Multithreading
IEEE Transactions on Computers
Selective cache ways: on-demand cache resource allocation
Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Application-specific memory management for embedded systems using software-controlled caches
Proceedings of the 37th Annual Design Automation Conference
Symbiotic jobscheduling for a simultaneous multithreaded processor
ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Pentium 4 Performance-Monitoring Features
IEEE Micro
Performance Tradeoffs in Multithreaded Processors
IEEE Transactions on Parallel and Distributed Systems
Supporting Fine-Grained Synchronization on a Simultaneous Multithreading Processor
HPCA '99 Proceedings of the 5th International Symposium on High Performance Computer Architecture
A New Memory Monitoring Scheme for Memory-Aware Scheduling and Partitioning
HPCA '02 Proceedings of the 8th International Symposium on High-Performance Computer Architecture
Dynamic Partitioning of Shared Cache Memory
The Journal of Supercomputing
Predictable performance in SMT processors
Proceedings of the 1st conference on Computing frontiers
VORPAL: a versatile plasma simulation code
Journal of Computational Physics
Architectural Support for Enhanced SMT Job Scheduling
Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
Fair Cache Sharing and Partitioning in a Chip Multiprocessor Architecture
Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
Balanced Multithreading: Increasing Throughput via a Low Cost Multithreading Hierarchy
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Predicting Inter-Thread Cache Contention on a Chip Multi-Processor Architecture
HPCA '05 Proceedings of the 11th International Symposium on High-Performance Computer Architecture
A Low-Complexity, High-Performance Fetch Unit for Simultaneous Multithreading Processors
HPCA '04 Proceedings of the 10th International Symposium on High Performance Computer Architecture
A multithreaded PowerPC processor for commercial servers
IBM Journal of Research and Development
Thread-associative memory for multicore and multithreaded computing
Proceedings of the 2006 international symposium on Low power electronics and design
Compositional, dynamic cache management for embedded chip multiprocessors
Proceedings of the conference on Design, automation and test in Europe
FlexDCP: a QoS framework for CMP architectures
ACM SIGOPS Operating Systems Review
Compositional, Dynamic Cache Management for Embedded Chip Multiprocessors
Journal of Signal Processing Systems
A majority-based control scheme for way-adaptable caches
Facing the multicore-challenge
Dynamic cache partitioning based on the MLP of cache misses
Transactions on high-performance embedded architectures and compilers III
A majority-based control scheme for way-adaptable caches
Facing the multicore-challenge
Proceedings of the 48th Design Automation Conference
Dynamic Cache Reconfiguration for Soft Real-Time Systems
ACM Transactions on Embedded Computing Systems (TECS)
Hi-index | 0.00 |
Chip multi-processors (CMP) are rapidly emerging as an important design paradigm for both high performance and embedded processors. These machines provide an important performance alternative to increasing the clock frequency. In spite of the increase in potential performance, several issues related to resource sharing on the chip can negatively impact the performance of embedded applications. In particular, the shared on-chip caches make each job's memory access times dependent on the behavior of the other jobs sharing the cache. If not adequately managed, this can lead to problems in meeting hard real-time scheduling constraints. This work explores adaptable caching strategies which balance the resource demands of each application and in turn lead to improvements in throughput for the collective workload. Experimental results demonstrate speedups of up to 1.47X for workloads of two co-scheduled applications compared against a fully-shared two-level cache hierarchy. Additionally, the adaptable caching scheme is shown to achieve an average speedup of 1.10X over the leading cache partitioning model. By dynamically managing cache storage for multiple application threads at runtime, sizable performance levels are achieved, which provides chip designers the opportunity to maintain high performance as cache size and power budgets become a concern in the CMP design space.