Hardware support for multithreaded execution of loops with limited parallelism

Authors:
Georgios Dimitriou;Constantine Polychronopoulos
Affiliations:
Dept. of Computer & Communications Engineering, University of Thessaly, Volos, Greece;Dept. of Electrical & Computer Engineering, University of Illinois at Urbana-Champaign, Urbana, Illinois
Venue:
PCI'05 Proceedings of the 10th Panhellenic conference on Advances in Informatics
Year:
2005

Citing 25
Cited 0

Advanced compiler optimizations for supercomputers

Communications of the ACM - Special issue on parallelism
Deadlock avoidance for systolic communication

ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
I-structures: data structures for parallel computing

ACM Transactions on Programming Languages and Systems (TOPLAS)
The explicit token store

Journal of Parallel and Distributed Computing - Special issue: data-flow processing
Functional parallelism: theoretical foundations and implementation

Functional parallelism: theoretical foundations and implementation
Instruction-level parallel processing: history, overview, and perspective

The Journal of Supercomputing - Special issue on instruction-level parallelism
The superblock: an effective technique for VLIW and superscalar compilation

The Journal of Supercomputing - Special issue on instruction-level parallelism
Enhancing instruction level parallelism through compiler-controlled speculation

Enhancing instruction level parallelism through compiler-controlled speculation
Architectural and implementation tradeoffs for multiple-context processors

Architectural and implementation tradeoffs for multiple-context processors
Simultaneous multithreading: maximizing on-chip parallelism

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Multiscalar processors

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Modulo scheduling for control-intensive general-purpose programs

Modulo scheduling for control-intensive general-purpose programs
Speculative multithreaded processors

ICS '98 Proceedings of the 12th international conference on Supercomputing
Threaded multiple path execution

Proceedings of the 25th annual international symposium on Computer architecture
A dynamic multithreading processor

MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Supporting systolic and memory communication in iWarp

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Removing architectural bottlenecks to the scalability of speculative parallelization

ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Dependence Analysis for Supercomputing

Dependence Analysis for Supercomputing
Compiler optimization of scalar value communication between speculative threads

Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Some scheduling techniques and an easily schedulable horizontal architecture for high performance scientific computing

MICRO 14 Proceedings of the 14th annual workshop on Microprogramming
A Clustered Approach to Multithreaded Processors

IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
The Superthreaded Architecture: Thread Pipelining with Run-Time Data Dependence Checking and Control Speculation

PACT '96 Proceedings of the 1996 Conference on Parallel Architectures and Compilation Techniques
Loop Scheduling for Multithreaded Processors

PARELEC '04 Proceedings of the international conference on Parallel Computing in Electrical Engineering
Trace Scheduling: A Technique for Global Microcode Compaction

IEEE Transactions on Computers
High-Speed Multiprocessors and Compilation Techniques

IEEE Transactions on Computers

Quantified Score

Hi-index	0.00

Visualization

Abstract

Loop scheduling has significant differences in multithreaded from other parallel processors. The sharing of hardware resources imposes new scheduling limitations, but it also allows a faster communication across threads. We present a multithreaded processor model, Coral 2000, with hardware extensions that support Macro Software Pipelining, a loop scheduling technique for multithreaded processors. We tested and evaluated Coral 2000 on a cycle-level simulator, using synthetic and integer SPEC benchmarks. We obtained speedups of up to 30% with respect to highly optimized superblock-based schedules on loops that exhibit limited parallelism.