Algorithms for scalable synchronization on shared-memory multiprocessors
ACM Transactions on Computer Systems (TOCS)
Using MPI: portable parallel programming with the message-passing interface
Using MPI: portable parallel programming with the message-passing interface
PPOPP '90 Proceedings of the second ACM SIGPLAN symposium on Principles & practice of parallel programming
OpenMP: An Industry-Standard API for Shared-Memory Programming
IEEE Computational Science & Engineering
A Multi-Platform Co-Array Fortran Compiler
Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
Decoupled Software Pipelining with the Synchronization Array
Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
X10: an object-oriented approach to non-uniform cluster computing
OOPSLA '05 Proceedings of the 20th annual ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
Programming for parallelism and locality with hierarchically tiled arrays
Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Sequoia: programming the memory hierarchy
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Sequoia: programming the memory hierarchy
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Optimistic parallelism requires abstractions
Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation
Software behavior oriented parallelization
Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation
Parallel Programmability and the Chapel Language
International Journal of High Performance Computing Applications
Optimistic parallelism benefits from data partitioning
Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Intel threading building blocks
Intel threading building blocks
How much parallelism is there in irregular applications?
Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
A comprehensive strategy for contention management in software transactional memory
Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
High-Performance Embedded Architecture and Compilation Roadmap
Transactions on High-Performance Embedded Architectures and Compilers I
Proceedings of the 2009 ACM SIGPLAN conference on Programming language design and implementation
Fast Track: A Software System for Speculative Program Optimization
Proceedings of the 7th annual IEEE/ACM International Symposium on Code Generation and Optimization
OpenMP implementation of SPICE3 circuit simulator
International Journal of Parallel Programming
Structure-driven optimizations for amorphous data-parallel programs
Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
Safe programmable speculative parallelism
PLDI '10 Proceedings of the 2010 ACM SIGPLAN conference on Programming language design and implementation
Supporting speculative parallelization in the presence of dynamic data structures
PLDI '10 Proceedings of the 2010 ACM SIGPLAN conference on Programming language design and implementation
Bamboo: a data-centric, object-oriented approach to many-core software
PLDI '10 Proceedings of the 2010 ACM SIGPLAN conference on Programming language design and implementation
Speculative parallelization using state separation and multiple value prediction
Proceedings of the 2010 international symposium on Memory management
A programming model for deterministic task parallelism
Proceedings of the 2011 ACM SIGPLAN Workshop on Memory Systems Performance and Correctness
Safe parallel programming using dynamic dependence hints
Proceedings of the 2011 ACM international conference on Object oriented programming systems languages and applications
PLDS: Partitioning linked data structures for parallelism
ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
OpenMP-style parallelism in data-centered multicore computing with R
Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming
Effective parallelization of loops in the presence of I/O operations
Proceedings of the 33rd ACM SIGPLAN conference on Programming Language Design and Implementation
General data structure expansion for multi-threading
Proceedings of the 34th ACM SIGPLAN conference on Programming language design and implementation
A work-stealing scheduling framework supporting fault tolerance
Proceedings of the Conference on Design, Automation and Test in Europe
Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
HPar: A practical parallel parser for HTML--taming HTML complexities for parallel parsing
ACM Transactions on Architecture and Code Optimization (TACO)
Hi-index | 0.00 |
In this paper we present an approach to parallel programming called SpiceC. SpiceC simplifies the task of parallel programming through a combination of an intuitive computation model and SpiceC directives. The SpiceC parallel computation model consists of multiple threads where every thread has a private space for data and all threads share data via a shared space. Each thread performs computations using its private space thus offering isolation which allows for speculative computations. SpiceC provides easy to use SpiceC compiler directives using which the programmers can express different forms of parallelism. It allows developers to express high level constraints on data transfers between spaces while the tedious task of generating the code for the data transfers is performed by the compiler. SpiceC also supports data transfers involving dynamic data structures without help from developers. SpiceC allows developers to create clusters of data to enable parallel data transfers. SpiceC programs are portable across modern chip multiprocessor based machines that may or may not support cache coherence. We have developed implementations of SpiceC for shared memory systems with and without cache coherence. We evaluate our implementation using seven benchmarks of which four are parallelized speculatively. Our compiler generated implementations achieve speedups ranging from 2x to 18x on a 24 core system.