Critical Block Scheduling: A Thread-Level Parallelizing Mechanism for a Heterogeneous Chip Multiprocessor Architecture

Authors:
Slo-Li Chu
Affiliations:
Department of Information and Computer Engineering, Chung Yuan Christian University, Chung-Li, Taiwan, R.O.C.
Venue:
Languages and Compilers for Parallel Computing
Year:
2007

Citing 17
Cited 0

Precise compile-time performance prediction for superscalar-based computers

PLDI '94 Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation
Thread scheduling for multiprogrammed multiprocessors

Proceedings of the tenth annual ACM symposium on Parallel algorithms and architectures
Active pages: a computation model for intelligent memory

Proceedings of the 25th annual international symposium on Computer architecture
Thread scheduling for out-of-core applications with memory server on multicomputers

Proceedings of the sixth workshop on I/O in parallel and distributed systems
Mapping irregular applications to DIVA, a PIM-based data-intensive architecture

SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
Maximizing Multiprocessor Performance with the SUIF Compiler

Computer
A Case for Intelligent RAM

IEEE Micro
Direct Rambus Technology: The New Main Memory Standard

IEEE Micro
MINT: A Front End for Efficient Simulation of Shared-Memory Multiprocessors

MASCOTS '94 Proceedings of the Second International Workshop on Modeling, Analysis, and Simulation On Computer and Telecommunication Systems
Exploiting On-Chip Memory Bandwidth in the VIRAM Compiler

IMS '00 Revised Papers from the Second International Workshop on Intelligent Memory Systems
The Relative Performance of Various Mapping Algorithms is Independent of Sizable Variances in Run-time Predictions

HCW '98 Proceedings of the Seventh Heterogeneous Computing Workshop
FlexRAM: Toward an Advanced Intelligent Memory System

ICCD '99 Proceedings of the 1999 IEEE International Conference on Computer Design
Evaluation of Computing in Memory Architectures for Digital Image Processing Applications

ICCD '99 Proceedings of the 1999 IEEE International Conference on Computer Design
Eliminating Squashes Through Learning Cross-Thread Violations in Speculative Parallelization for Multiprocessors

HPCA '02 Proceedings of the 8th International Symposium on High-Performance Computer Architecture
Swing Modulo Scheduling: A Lifetime-Sensitive Approach

PACT '96 Proceedings of the 1996 Conference on Parallel Architectures and Compilation Techniques
PSS: A Novel Statement Scheduling Mechanism for a High-Performance SoC Architecture

ICPADS '04 Proceedings of the Parallel and Distributed Systems, Tenth International Conference
Adaptive scheduling with parallelism feedback

Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming

Quantified Score

Hi-index	0.00

Visualization

Abstract

Processor-in-Memory (PIM) architectures are developed for high-performance computing by integrating processing units with memory blocks into a single chip to reduce the performance gap between the processor and the memory. The PIM architecture combines heterogeneous processors in a single system. These processors are characterized by their computation and memory-access capabilities. Therefore, a novel mechanism must be developed to identify their capabilities and dispatch the appropriate tasks to these heterogeneous processing elements. Accordingly, this paper presents a novel parallelizing mechanism, called Critical Block Scheduling to fully utilize all of the heterogeneous processors in the PIM architecture. Integrated with our thread-level parallelizing system, Octans, this mechanism decomposes the original program into blocks, produces corresponding dependence graph, creates a feasible execution schedule, and generates corresponding threads for the host and memory processors. The proposed Critical Block Scheduling not only can parallelize programs for PIM architectures but also can apply on other Multi-Processor System-on-Chip (MPSoC) and Chip Multiprocessor (CMP) architectures which consist of multiple heterogeneous processors. The experimental results of real benchmarks are also discussed.