Advanced compiler optimizations for supercomputers
Communications of the ACM - Special issue on parallelism
Deadlock avoidance for systolic communication
ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
I-structures: data structures for parallel computing
ACM Transactions on Programming Languages and Systems (TOPLAS)
Journal of Parallel and Distributed Computing - Special issue: data-flow processing
Functional parallelism: theoretical foundations and implementation
Functional parallelism: theoretical foundations and implementation
Instruction-level parallel processing: history, overview, and perspective
The Journal of Supercomputing - Special issue on instruction-level parallelism
The superblock: an effective technique for VLIW and superscalar compilation
The Journal of Supercomputing - Special issue on instruction-level parallelism
Enhancing instruction level parallelism through compiler-controlled speculation
Enhancing instruction level parallelism through compiler-controlled speculation
Architectural and implementation tradeoffs for multiple-context processors
Architectural and implementation tradeoffs for multiple-context processors
Simultaneous multithreading: maximizing on-chip parallelism
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Modulo scheduling for control-intensive general-purpose programs
Modulo scheduling for control-intensive general-purpose programs
Speculative multithreaded processors
ICS '98 Proceedings of the 12th international conference on Supercomputing
Threaded multiple path execution
Proceedings of the 25th annual international symposium on Computer architecture
A dynamic multithreading processor
MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Supporting systolic and memory communication in iWarp
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Removing architectural bottlenecks to the scalability of speculative parallelization
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Dependence Analysis for Supercomputing
Dependence Analysis for Supercomputing
Compiler optimization of scalar value communication between speculative threads
Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
MICRO 14 Proceedings of the 14th annual workshop on Microprogramming
A Clustered Approach to Multithreaded Processors
IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
PACT '96 Proceedings of the 1996 Conference on Parallel Architectures and Compilation Techniques
Loop Scheduling for Multithreaded Processors
PARELEC '04 Proceedings of the international conference on Parallel Computing in Electrical Engineering
Trace Scheduling: A Technique for Global Microcode Compaction
IEEE Transactions on Computers
High-Speed Multiprocessors and Compilation Techniques
IEEE Transactions on Computers
Hi-index | 0.00 |
Loop scheduling has significant differences in multithreaded from other parallel processors. The sharing of hardware resources imposes new scheduling limitations, but it also allows a faster communication across threads. We present a multithreaded processor model, Coral 2000, with hardware extensions that support Macro Software Pipelining, a loop scheduling technique for multithreaded processors. We tested and evaluated Coral 2000 on a cycle-level simulator, using synthetic and integer SPEC benchmarks. We obtained speedups of up to 30% with respect to highly optimized superblock-based schedules on loops that exhibit limited parallelism.