Safely exploiting multithreaded processors to tolerate memory latency in real-time systems

Authors:
Ali El-Haj-Mahmoud;Eric Rotenberg
Affiliations:
North Carolina State University, Raleigh, NC;North Carolina State University, Raleigh, NC
Venue:
Proceedings of the 2004 international conference on Compilers, architecture, and synthesis for embedded systems
Year:
2004

Citing 16
Cited 6

Compiler support for software-based cache partitioning

LCTES '95 Proceedings of the ACM SIGPLAN 1995 workshop on Languages, compilers, & tools for real-time systems
Exploiting choice: instruction fetch and issue on an implementable simultaneous multithreading processor

ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Evaluation of multithreaded uniprocessors for commercial application environments

ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
The Tera computer system

ICS '90 Proceedings of the 4th international conference on Supercomputing
Scheduling Algorithms for Multiprogramming in a Hard-Real-Time Environment

Journal of the ACM (JACM)
An Integrated Path and Timing Analysis Method based on Cycle-Level Symbolic Execution

Real-Time Systems
Thread-level parallelism and interactive performance of desktop applications

ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Real-Time Systems

Real-Time Systems
Hard Real-Time Computing Systems: Predictable Scheduling Algorithms and Applications

Hard Real-Time Computing Systems: Predictable Scheduling Algorithms and Applications
Deadline Scheduling for Real-Time Systems: Edf and Related Algorithms

Deadline Scheduling for Real-Time Systems: Edf and Related Algorithms
A survey of processors with explicit multithreading

ACM Computing Surveys (CSUR)
Real-time scheduling on multithreaded processors

RTCSA '00 Proceedings of the Seventh International Conference on Real-Time Systems and Applications
Integrating the timing analysis of pipelining and instruction caching

RTSS '95 Proceedings of the 16th IEEE Real-Time Systems Symposium
Techniques for Software Thread Integration in Real-Time Embedded Systems

RTSS '98 Proceedings of the IEEE Real-Time Systems Symposium
Soft Real- Time Scheduling on Simultaneous Multithreaded Processors

RTSS '02 Proceedings of the 23rd IEEE Real-Time Systems Symposium
Virtual simple architecture (VISA): exceeding the complexity limit in safe real-time systems

Proceedings of the 30th annual international symposium on Computer architecture

Virtual multiprocessor: an analyzable, high-performance architecture for real-time computing

Proceedings of the 2005 international conference on Compilers, architectures and synthesis for embedded systems
A case study of multi-threading in the embedded space

CASES '06 Proceedings of the 2006 international conference on Compilers, architecture and synthesis for embedded systems
Hardware support for WCET analysis of hard real-time multicore systems

Proceedings of the 36th annual international symposium on Computer architecture
Dynamic task set partitioning based on balancing resource requirements and utilization to reduce power consumption

Proceedings of the 2010 ACM Symposium on Applied Computing
MIPS MT: a multithreaded RISC architecture for embedded real-time processing

HiPEAC'08 Proceedings of the 3rd international conference on High performance embedded architectures and compilers
Timing effects of DDR memory systems in hard real-time multicore architectures: Issues and solutions

ACM Transactions on Embedded Computing Systems (TECS) - Special section on ESTIMedia'12, LCTES'11, rigorous embedded systems design, and multiprocessor system-on-chip for cyber-physical systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

A coarse-grain multithreaded processor can effectively hide long memory latencies by quickly switching to an alternate task when the active task issues a memory request, improving overall throughput. However, dynamic switching cannot be safely exploited to improve throughput in hard-real-time embedded systems. The schedulability of a task-set (guaranteeing all tasks meet deadlines) must be determined a priori using offline schedulability tests. Any computation/memory overlap must be statically accounted for. We develop a novel analytical framework that bounds the overlap between computation of a pipeline-resident-task and on-going memory transfers of other tasks. A simple closed-form schedulability test is derived, that only depends on the aggregate computation (C) and memory (M) components of tasks. Namely, the technique does not require specificity regarding the location of memory transfers within and among tasks and avoids searching all task permutations for a specific feasible schedule. To the best of our knowledge, this is the first work to provide the necessary formalism for safely and tractably exploiting coarse-grain multithreaded processors to tolerate memory latency in hard-real-time systems, exceeding the schedulability limits of classic real-time theory for uniprocessors. Our techniques make it possible to capitalize on higher frequency embedded processors, despite the widening processor-memory speed gap. Experiments with task-sets from C-lab benchmarks reveal improvement in the schedulability of task-sets, measured as the ability to schedule previously infeasible task-sets or reduce utilization for already feasible task-sets. We also demonstrate proof-of-concept by deploying our method in a cycle-level simulator of an ARM11-like embedded microprocessor augmented with multiple register contexts, the same hardware multithreading support available in Ubicom's IP3023 embedded microprocessor.