The Omega test: a fast and practical integer programming algorithm for dependence analysis
Proceedings of the 1991 ACM/IEEE conference on Supercomputing
In search of clusters (2nd ed.)
In search of clusters (2nd ed.)
Information Technology-Portable Operating System Interface
Information Technology-Portable Operating System Interface
Parallel Computer Architecture: A Hardware/Software Approach
Parallel Computer Architecture: A Hardware/Software Approach
StreamIt: A Language for Streaming Applications
CC '02 Proceedings of the 11th International Conference on Compiler Construction
Compiler optimization-space exploration
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Itanium 2 Processor Microarchitecture
IEEE Micro
Multiple Instruction Stream Processor
Proceedings of the 33rd annual international symposium on Computer Architecture
Extending Multicore Architectures to Exploit Hybrid Parallelism in Single-thread Applications
HPCA '07 Proceedings of the 2007 IEEE 13th International Symposium on High Performance Computer Architecture
Patterns for parallel programming
Patterns for parallel programming
Automatic Discovery of Coarse-Grained Parallelism in Media Applications
Transactions on High-Performance Embedded Architectures and Compilers I
The spec# programming system: an overview
CASSIS'04 Proceedings of the 2004 international conference on Construction and Analysis of Safe, Secure, and Interoperable Smart Devices
Parallelizing CAD: a timely research agenda for EDA
Proceedings of the 45th annual Design Automation Conference
A performance study of general-purpose applications on graphics processors using CUDA
Journal of Parallel and Distributed Computing
DMP: deterministic shared memory multiprocessing
Proceedings of the 14th international conference on Architectural support for programming languages and operating systems
Proceedings of the 14th international conference on Architectural support for programming languages and operating systems
Data parallel dialect of scheme: outline of the formal model, implementation, performance
Proceedings of the 2009 ACM symposium on Applied Computing
Research on Evaluation of Parallelization on an Embedded Multicore Platform
APPT '09 Proceedings of the 8th International Symposium on Advanced Parallel Processing Technologies
A program auto-parallelizer based on the component technology of optimizing compiler construction
Programming and Computing Software
The Paralax infrastructure: automatic parallelization with a helping hand
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Task superscalar: using processors as functional units
HotPar'10 Proceedings of the 2nd USENIX conference on Hot topics in parallelism
Optimal Utilization of Heterogeneous Resources for Biomolecular Simulations
Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Commutative set: a language extension for implicit parallel programming
Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation
ALTER: exploiting breakable dependences for parallelization
Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation
A survey of the practice of computational science
State of the Practice Reports
Auto-generation and auto-tuning of 3D stencil codes on GPU clusters
Proceedings of the Tenth International Symposium on Code Generation and Optimization
Towards software performance engineering for multicore and manycore systems
ACM SIGMETRICS Performance Evaluation Review
Hi-index | 0.00 |
This paper argues for an implicitly parallel programming model for many-core microprocessors, and provides initial technical approaches towards this goal. In an implicitly parallel programming model, programmers maximize algorithm-level parallelism, express their parallel algorithms by asserting high-level properties on top of a traditional sequential programming language, and rely on parallelizing compilers and hardware support to perform parallel execution under the hood. In such a model, compilers and related tools require much more advanced program analysis capabilities and programmer assertions than what are currently available so that a comprehensive understanding of the input program's concurrency can be derived. Such an understanding is then used to drive automatic or interactive parallel code generation tools for a diverse set of parallel hardware organizations. The chip-level architecture and hardware should maintain parallel execution state in such a way that a strictly sequential execution state can always be derived for the purpose of verifying and debugging the program. We argue that implicitly parallel programming models are critical for addressing the software development crises and software scalability challenges for many-core microprocessors.