Distributed order scheduling and its application to multi-core dram controllers

Authors:
Thomas Moscibroda;Onur Mutlu
Affiliations:
Microsoft Research, Redmond, WA, USA;Microsoft Research, Redmond, WA, USA
Venue:
Proceedings of the twenty-seventh ACM symposium on Principles of distributed computing
Year:
2008

Citing 19
Cited 6

Structure of a simple scheduling polyhedron

Mathematical Programming: Series A and B
New and improved algorithms for minsum shop scheduling

SODA '00 Proceedings of the eleventh annual ACM-SIAM symposium on Discrete algorithms
Memory access scheduling

Proceedings of the 27th annual international symposium on Computer architecture
Symbiotic jobscheduling for a simultaneous multithreaded processor

ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Scheduling to Minimize Total Weighted Completion Time: Performance Guarantees of LP-Based Heuristics and Lower Bounds

Proceedings of the 5th International IPCO Conference on Integer Programming and Combinatorial Optimization
A First-Order Superscalar Processor Model

Proceedings of the 31st annual international symposium on Computer architecture
Concurrent open shop scheduling to minimize the weighted number of tardy jobs

Journal of Scheduling
Pinpointing Representative Portions of Large Intel® Itanium® Programs with Dynamic Instrumentation

Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Pin: building customized program analysis tools with dynamic instrumentation

Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
Order Scheduling in an Environment with Dedicated Resources in Parallel

Journal of Scheduling
Efficient Runahead Execution: Power-Efficient Memory Latency Tolerance

IEEE Micro
A note on the complexity of the concurrent open shop problem

Journal of Scheduling
Fair Queuing Memory Systems

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Framework for instruction-level tracing and analysis of program executions

Proceedings of the 2nd international conference on Virtual execution environments
Scheduling orders for multiple product types to minimize total weighted completion time

Discrete Applied Mathematics
Stall-Time Fair Memory Access Scheduling for Chip Multiprocessors

Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
Memory performance attacks: denial of memory service in multi-core systems

SS'07 Proceedings of 16th USENIX Security Symposium on USENIX Security Symposium
Parallelism-Aware Batch Scheduling: Enhancing both Performance and Fairness of Shared DRAM Systems

ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Order scheduling models: hardness and algorithms

FSTTCS'07 Proceedings of the 27th international conference on Foundations of software technology and theoretical computer science

Data layout transformation exploiting memory-level parallelism in structured grid many-core applications

Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Thread Cluster Memory Scheduling: Exploiting Differences in Memory Access Behavior

MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Memory systems in the many-core era: challenges, opportunities, and solution directions

Proceedings of the international symposium on Memory management
Parallel application memory scheduling

Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Reducing memory interference in multicore systems via application-aware memory channel partitioning

Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Orchestrated scheduling and prefetching for GPGPUs

Proceedings of the 40th Annual International Symposium on Computer Architecture

Quantified Score

Hi-index	0.00

Visualization

Abstract

We study a distributed version of the order scheduling problem that arises when scheduling memory requests in shared DRAM systems of many-core architectures. In this problem, a set of n customer orders needs to be scheduled on multiple facilities. An order can consist of multiple requests, each of which has to be serviced on one designated facility, and an order is completed only when all its requests have been serviced. In the distributed setting, every facility has its own request buffer and must schedule the requests having only limited knowledge about the buffer state at other facilities In this paper, we quantify the trade-off between the amount of communication among different facilities and the quality of the resulting global solution. We show that without communication, the average completion time of all orders can be by a factor Ω(√n) worse than in the optimal schedule. On the other hand, there exists a 2-approximation algorithm if the complete buffer states are exchanged in n communication rounds. We then prove a general upper bound that characterizes the region between these extreme points. Specifically, we devise a distributed scheduling algorithm that, for any k, achieves an approximation ratio of O(k) in n/k communication rounds. Finally, we empirically test the performance of our different algorithms in a many-core environment using SPEC CPU2006 benchmarks as well as Windows desktop application traces.