Thread scheduling for cache locality

Authors:
James Philbin;Jan Edler;Otto J. Anshus;Craig C. Douglas;Kai Li
Affiliations:
NEC Research Institute, 4 Independence Way, Princeton, NJ;NEC Research Institute, 4 Independence Way, Princeton, NJ;Department of Computer Science, Institute of Mathematical and Physical Sciences, University of Tromso, N-9037 Tromso, Norway;IBM T.J. Watson Research Center, P.O. Box 218, Yorktown Heights, NY and Department of Computer Science, Yale University, P.O. Box 208285, New Haven, CT;Department of Computer Science, Princeton University, Princeton, NJ
Venue:
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Year:
1996

Citing 25
Cited 29

on Parallel MIMD computation: HEP supercomputer and its applications

on Parallel MIMD computation: HEP supercomputer and its applications
Synchronization primitives for a multiprocessor: a formal specification

SOSP '87 Proceedings of the eleventh ACM Symposium on Operating systems principles
Strategies for cache and local memory management by global program transformation

Journal of Parallel and Distributed Computing - Special Issue on Languages, Compilers and environments for Parallel Programming
An open enviornment for building parallel programming systems

PPEALS '88 Proceedings of the ACM/SIGPLAN conference on Parallel programming: experience with applications, languages and systems
Threads and input/output in the synthesis kernal

SOSP '89 Proceedings of the twelfth ACM symposium on Operating systems principles
Evaluating Associativity in CPU Caches

IEEE Transactions on Computers
The Performance Implications of Thread Management Alternatives for Shared-Memory Multiprocessors

IEEE Transactions on Computers
The cache performance and optimizations of blocked algorithms

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Scheduler activations: effective kernel support for the user-level management of parallelism

SOSP '91 Proceedings of the thirteenth ACM symposium on Operating systems principles
Computer Technology and Architecture: An Evolving Interaction

Computer
Page placement algorithms for large real-indexed caches

ACM Transactions on Computer Systems (TOCS)
Processor coupling: integrating compile time and runtime scheduling for parallelism

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
A customizable substrate for concurrent languages

PLDI '92 Proceedings of the ACM SIGPLAN 1992 conference on Programming language design and implementation
Avoiding conflict misses dynamically in large direct-mapped caches

ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Tile size selection using cache organization and data layout

PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
Cilk: an efficient multithreaded runtime system

PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Hitting the memory wall: implications of the obvious

ACM SIGARCH Computer Architecture News
Simultaneous multithreading: maximizing on-chip parallelism

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
The Tera computer system

ICS '90 Proceedings of the 4th international conference on Supercomputing
APRIL: a processor architecture for multiprocessing

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Experience with processes and monitors in Mesa

Communications of the ACM
Monitors: an operating system structuring concept

Communications of the ACM
Scaling Parallel Programs for Multiprocessors: Methodology and Examples

Computer
Using Processor-Cache Affinity Information in Shared-Memory Multiprocessor Scheduling

IEEE Transactions on Parallel and Distributed Systems
lmbench: portable tools for performance analysis

ATEC '96 Proceedings of the 1996 annual conference on USENIX Annual Technical Conference

Dynamic pointer alignment: tiling and communication optimizations for parallel pointer-based computations

PPOPP '97 Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming
Per-Node Multithreading and Remote Latency

IEEE Transactions on Computers
An evaluation of automatic object inline allocation techniques

Proceedings of the 13th ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
Performance counters and state sharing annotations: a unified approach to thread locality

Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Effects of Multithreading on Cache Performance

IEEE Transactions on Computers - Special issue on cache memory and related problems
Thread scheduling for out-of-core applications with memory server on multicomputers

Proceedings of the sixth workshop on I/O in parallel and distributed systems
Scheduling threads for low space requirement and good locality

Proceedings of the eleventh annual ACM symposium on Parallel algorithms and architectures
The data locality of work stealing

Proceedings of the twelfth annual ACM symposium on Parallel algorithms and architectures
Optimizing Overall Loop Schedules Using Prefetching and Partitioning

IEEE Transactions on Parallel and Distributed Systems
An automatic object inlining optimization and its evaluation

PLDI '00 Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation
Cacheminer: A Runtime Approach to Exploit Cache Locality on SMP

IEEE Transactions on Parallel and Distributed Systems
A new sparse-matrix storage method for adaptively solving large systems of reaction-diffusion-transport equations

Computing
Pthreads for dynamic and irregular parallelism

SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
Predictive scheduling of network processors

Computer Networks: The International Journal of Computer and Telecommunications Networking - Network processors
Restructuring computations for temporal data cache locality

International Journal of Parallel Programming
Scheduling Algorithms for Effective Thread Pairing on Hybrid Multiprocessors

IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
Memory Performance Optimizations For Real-Time Software HDTV Decoding

Journal of VLSI Signal Processing Systems
Performance Enhancement on Microprocessors with Hierarchical Memory Systems for Solving Large Sparse Linear Systems

International Journal of High Performance Computing Applications
Scheduling threads for constructive cache sharing on CMPs

Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures
Carbon: architectural support for fine-grained parallelism on chip multiprocessors

Proceedings of the 34th annual international symposium on Computer architecture
Feedback-directed thread scheduling with memory considerations

Proceedings of the 16th international symposium on High performance distributed computing
Thread clustering: sharing-aware scheduling on SMP-CMP-SMT multiprocessors

Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
Allocation-phase aware thread scheduling policies to improve garbage collection performance

Proceedings of the 6th international symposium on Memory management
Efficient execution of multiple queries on deep memory hierarchy

Journal of Computer Science and Technology
Scheduling strategies for optimistic parallel execution of irregular programs

Proceedings of the twentieth annual symposium on Parallelism in algorithms and architectures
Iterational retiming with partitioning: Loop scheduling with complete memory latency hiding

ACM Transactions on Embedded Computing Systems (TECS)
A moving threads processor architecture MTPA

The Journal of Supercomputing
MiniTasking: improving cache performance for multiple query workloads

WAIM '06 Proceedings of the 7th international conference on Advances in Web-Age Information Management
Operating system support for multimedia systems

Computer Communications

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper describes a method to improve the cache locality of sequential programs by scheduling fine-grained threads. The algorithm relies upon hints provided at the time of thread creation to determine a thread execution order likely to reduce cache misses. This technique may be particularly valuable when compiler-directed tiling is not feasible. Experiments with several application programs, on two systems with different cache structures, show that our thread scheduling method can improve program performance by reducing second-level cache misses.