Implicitly parallel programming models for thousand-core microprocessors

Authors:
Wen-mei Hwu;Shane Ryoo;Sain-Zee Ueng;John H. Kelm;Isaac Gelado;Sam S. Stone;Robert E. Kidd;Sara S. Baghsorkhi;Aqeel A. Mahesri;Stephanie C. Tsao;Nacho Navarro;Steve S. Lumetta;Matthew I. Frank;Sanjay J. Patel
Affiliations:
University of Illinois at Urbana-Champaign;University of Illinois at Urbana-Champaign;University of Illinois at Urbana-Champaign;University of Illinois at Urbana-Champaign;Universitat Politecnica de Catalunya (UPC);University of Illinois at Urbana-Champaign;University of Illinois at Urbana-Champaign;University of Illinois at Urbana-Champaign;University of Illinois at Urbana-Champaign;University of Illinois at Urbana-Champaign;Universitat Politecnica de Catalunya (UPC);University of Illinois at Urbana-Champaign;University of Illinois at Urbana-Champaign;University of Illinois at Urbana-Champaign
Venue:
Proceedings of the 44th annual Design Automation Conference
Year:
2007

Citing 13
Cited 15

The Omega test: a fast and practical integer programming algorithm for dependence analysis

Proceedings of the 1991 ACM/IEEE conference on Supercomputing
In search of clusters (2nd ed.)

In search of clusters (2nd ed.)
Information Technology-Portable Operating System Interface

Information Technology-Portable Operating System Interface
Parallel Computer Architecture: A Hardware/Software Approach

Parallel Computer Architecture: A Hardware/Software Approach
StreamIt: A Language for Streaming Applications

CC '02 Proceedings of the 11th International Conference on Compiler Construction
Compiler optimization-space exploration

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Itanium 2 Processor Microarchitecture

IEEE Micro
NP-Click: A Productive Software Development Approach for Network Processors

IEEE Micro
Multiple Instruction Stream Processor

Proceedings of the 33rd annual international symposium on Computer Architecture
Extending Multicore Architectures to Exploit Hybrid Parallelism in Single-thread Applications

HPCA '07 Proceedings of the 2007 IEEE 13th International Symposium on High Performance Computer Architecture
Patterns for parallel programming

Patterns for parallel programming
Automatic Discovery of Coarse-Grained Parallelism in Media Applications

Transactions on High-Performance Embedded Architectures and Compilers I
The spec# programming system: an overview

CASSIS'04 Proceedings of the 2004 international conference on Construction and Analysis of Safe, Secure, and Interoperable Smart Devices

Parallelizing CAD: a timely research agenda for EDA

Proceedings of the 45th annual Design Automation Conference
A performance study of general-purpose applications on graphics processors using CUDA

Journal of Parallel and Distributed Computing
DMP: deterministic shared memory multiprocessing

Proceedings of the 14th international conference on Architectural support for programming languages and operating systems
Commutativity analysis for software parallelization: letting program transformations see the big picture

Proceedings of the 14th international conference on Architectural support for programming languages and operating systems
Data parallel dialect of scheme: outline of the formal model, implementation, performance

Proceedings of the 2009 ACM symposium on Applied Computing
Research on Evaluation of Parallelization on an Embedded Multicore Platform

APPT '09 Proceedings of the 8th International Symposium on Advanced Parallel Processing Technologies
A program auto-parallelizer based on the component technology of optimizing compiler construction

Programming and Computing Software
The Paralax infrastructure: automatic parallelization with a helping hand

Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Task superscalar: using processors as functional units

HotPar'10 Proceedings of the 2nd USENIX conference on Hot topics in parallelism
Optimal Utilization of Heterogeneous Resources for Biomolecular Simulations

Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Commutative set: a language extension for implicit parallel programming

Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation
ALTER: exploiting breakable dependences for parallelization

Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation
A survey of the practice of computational science

State of the Practice Reports
Auto-generation and auto-tuning of 3D stencil codes on GPU clusters

Proceedings of the Tenth International Symposium on Code Generation and Optimization
Towards software performance engineering for multicore and manycore systems

ACM SIGMETRICS Performance Evaluation Review

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper argues for an implicitly parallel programming model for many-core microprocessors, and provides initial technical approaches towards this goal. In an implicitly parallel programming model, programmers maximize algorithm-level parallelism, express their parallel algorithms by asserting high-level properties on top of a traditional sequential programming language, and rely on parallelizing compilers and hardware support to perform parallel execution under the hood. In such a model, compilers and related tools require much more advanced program analysis capabilities and programmer assertions than what are currently available so that a comprehensive understanding of the input program's concurrency can be derived. Such an understanding is then used to drive automatic or interactive parallel code generation tools for a diverse set of parallel hardware organizations. The chip-level architecture and hardware should maintain parallel execution state in such a way that a strictly sequential execution state can always be derived for the purpose of verifying and debugging the program. We argue that implicitly parallel programming models are critical for addressing the software development crises and software scalability challenges for many-core microprocessors.