Compiler techniques for reducing data cache miss rate on a multithreaded architecture

Authors:
Subhradyuti Sarkar;Dean M. Tullsen
Affiliations:
Department of Computer Science and Engineering, University of California, San Diego;Department of Computer Science and Engineering, University of California, San Diego
Venue:
HiPEAC'08 Proceedings of the 3rd international conference on High performance embedded architectures and compilers
Year:
2008

Citing 15
Cited 14

The effect of page allocation on caches

MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Column-associative caches: a technique for reducing the miss rate of direct-mapped caches

ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Simultaneous multithreading: maximizing on-chip parallelism

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Exploiting choice: instruction fetch and issue on an implementable simultaneous multithreading processor

ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Data transformations for eliminating conflict misses

PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
Cache-conscious data placement

Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Randomized Cache Placement for Eliminating Conflicts

IEEE Transactions on Computers - Special issue on cache memory and related problems
Handling long-latency loads in a simultaneous multithreading processor

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Skewed-associative Caches

PARLE '93 Proceedings of the 5th International PARLE Conference on Parallel Architectures and Languages Europe
Compiling for instruction cache performance on a multithreaded architecture

Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Power-Sensitive Multithreaded Architecture

ICCD '00 Proceedings of the 2000 IEEE International Conference on Computer Design: VLSI in Computers & Processors
ATOM: a system for building customized program analysis tools

ACM SIGPLAN Notices - Best of PLDI 1979-1999
Understanding the energy efficiency of simultaneous multithreading

Proceedings of the 2004 international symposium on Low power electronics and design
Conjoined-Core Chip Multiprocessing

Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Dynamic capacity-speed tradeoffs in SMT processor caches

HiPEAC'07 Proceedings of the 2nd international conference on High performance embedded architectures and compilers

Does cache sharing on modern CMP matter to the performance of contemporary multithreaded programs?

Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
Cache topology aware computation mapping for multicores

PLDI '10 Proceedings of the 2010 ACM SIGPLAN conference on Programming language design and implementation
A workload-aware mapping approach for data-parallel programs

Proceedings of the 6th International Conference on High Performance and Embedded Architectures and Compilers
Array regrouping on CMP with non-uniform cache sharing

LCPC'10 Proceedings of the 23rd international conference on Languages and compilers for parallel computing
Data layout for cache performance on a multithreaded architecture

Transactions on high-performance embedded architectures and compilers III
Studying inter-core data reuse in multicores

Proceedings of the ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
The impact of memory subsystem resource sharing on datacenter applications

Proceedings of the 38th annual international symposium on Computer architecture
Studying inter-core data reuse in multicores

ACM SIGMETRICS Performance Evaluation Review - Performance evaluation review
Trace-Based data layout optimizations for multi-core processors

HiPEAC'10 Proceedings of the 5th international conference on High Performance Embedded Architectures and Compilers
Combining locality analysis with online proactive job co-scheduling in chip multiprocessors

HiPEAC'10 Proceedings of the 5th international conference on High Performance Embedded Architectures and Compilers
Neighborhood-aware data locality optimization for NoC-based multicores

CGO '11 Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization
Compiler-assisted preferred caching for embedded systems with STT-RAM based hybrid cache

Proceedings of the 13th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, Tools and Theory for Embedded Systems
MAC: migration-aware compilation for STT-RAM based hybrid cache in embedded systems

Proceedings of the 2012 ACM/IEEE international symposium on Low power electronics and design
Reshaping cache misses to improve row-buffer locality in multicore systems

PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques

Quantified Score

Hi-index	0.00

Visualization

Abstract

High performance embedded architectures will in some cases combine simple caches and multithreading, two techniques that increase energy efficiency and performance at the same time. However, that combination can produce high and unpredictable cache miss rates, even when the compiler optimizes the data layout of each program for the cache. This paper examines data-cache aware compilation for multithreaded architectures. Data-cache aware compilation finds a layout for data objects which minimizes inter-object conflict misses. This research extends and adapts prior cache-conscious data layout optimizations to the much more difficult environment of multithreaded architectures. Solutions are presented for two computing scenarios: (1) the more general case where any application can be scheduled along with other applications, and (2) the case where the co-scheduled working set is more precisely known.