Near Fine Grain Parallel Processing Using Static Scheduling on Single Chip Multiprocessors

Authors:
Keiji Kimura;Hironori Kasahara
Affiliations:
-;-
Venue:
IWIA '99 Proceedings of the 1999 International Workshop on Innovative Architecture
Year:
1999

Citing 12
Cited 2

Parallel processing of near fine grain tasks using static scheduling OSCAR (optimally scheduled advanced multiprocessor)

Proceedings of the 1990 ACM/IEEE conference on Supercomputing
A data-localization compilation scheme using partial-static task assignment for Fortran coarse-grain parallel processing

Parallel Computing - Special issues on languages and compilers for parallel computers
Task selection for a multiscalar processor

MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Maps: a compiler-managed memory system for raw machines

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Improving the performance of speculatively parallel applications on the Hydra CMP

ICS '99 Proceedings of the 13th international conference on Supercomputing
Parallel Language and Compiler Research in Japan

Parallel Language and Compiler Research in Japan
Trace Processors: Moving to Fourth-Generation Microarchitectures

Computer
A New Direction for Computer Architecture Research

Computer
Data Localization Using Loop Aligned Decomposition for Macro-Dataflow Processing

LCPC '96 Proceedings of the 9th International Workshop on Languages and Compilers for Parallel Computing
A Multi-Grain Parallelizing Compilation Scheme for OSCAR (Optimally Scheduled Advanced Multiprocessor)

Proceedings of the Fourth International Workshop on Languages and Compilers for Parallel Computing
Performance Study of a Concurrent Multithreaded Processor

HPCA '98 Proceedings of the 4th International Symposium on High-Performance Computer Architecture
FlexRAM: Toward an Advanced Intelligent Memory System

ICCD '99 Proceedings of the 1999 IEEE International Conference on Computer Design

Coarse grain task parallel processing with cache optimization on shared memory multiprocessor

LCPC'01 Proceedings of the 14th international conference on Languages and compilers for parallel computing
Performance of OSCAR multigrain parallelizing compiler on SMP servers

LCPC'04 Proceedings of the 17th international conference on Languages and Compilers for High Performance Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

With the increase of the number of transistors integrated on a chip, efficient use of transistors and scalable improvement of effective performance of a processor are getting important problems. However, it has been thought that popular superscalar and VLIW would have difficulty to obtain scalable improvement of effective performance in future because of the limitation of instruction level parallelism. To cope with this problem, a single chip multiprocessor (SCM) approach with multi grain parallel processing inside a chip, which hierarchically exploits loop parallelism and coarse grain parallelism among subroutines, loops and basic blocks in addition to instruction level parallelism, is thought one of the most promising approaches. This paper evaluates effectiveness of the single chip multiprocessor architectures with a shared cache, global registers, distributed shared memory and/or local memory for near fine grain parallel processing as the first step of research on SCM architecture to support multi grain parallel processing. The evaluation shows OSCAR (Optimally Scheduled Advanced Multiprocessor) architecture having distributed shared memory and local memory in addition to centralized shared memory and attachment of global register gives us significant speed up such as 13.8% to 143.8% for four processors compared with shared cache architecture for applications which have been difficult to extract parallelism effectively.