The program dependence graph and its use in optimization
ACM Transactions on Programming Languages and Systems (TOPLAS)
The design, implementation and evaluation of Jade: a portable, implicitly parallel programming language
Commutativity analysis: a new analysis framework for parallelizing compilers
PLDI '96 Proceedings of the ACM SIGPLAN 1996 conference on Programming language design and implementation
A provable time and space efficient implementation of NESL
Proceedings of the first ACM SIGPLAN international conference on Functional programming
Olden: parallelizing programs with dynamic data structures on distributed-memory machines
Olden: parallelizing programs with dynamic data structures on distributed-memory machines
Programming with POSIX threads
Programming with POSIX threads
Optimizing compilers for modern architectures: a dependence-based approach
Optimizing compilers for modern architectures: a dependence-based approach
NetBench: a benchmarking suite for network processors
Proceedings of the 2001 IEEE/ACM international conference on Computer-aided design
Beyond Arrays - A Container-Centric Approach for Parallelization of Real-World Symbolic Applications
LCPC '98 Proceedings of the 11th International Workshop on Languages and Compilers for Parallel Computing
Experience in the Automatic Parallelization of Four Perfect-Benchmark Programs
Proceedings of the Fourth International Workshop on Languages and Compilers for Parallel Computing
LLVM: A Compilation Framework for Lifelong Program Analysis & Transformation
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Automatic Thread Extraction with Decoupled Software Pipelining
Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
SPEC CPU2006 benchmark descriptions
ACM SIGARCH Computer Architecture News
Transactional collection classes
Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming
Implicit parallelism with ordered transactions
Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming
Optimistic parallelism requires abstractions
Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation
Implicitly parallel programming models for thousand-core microprocessors
Proceedings of the 44th annual Design Automation Conference
Feedback directed implicit parallelism
ICFP '07 Proceedings of the 12th ACM SIGPLAN international conference on Functional programming
Revisiting the Sequential Programming Model for Multi-Core
Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
Parallel-stage decoupled software pipelining
Proceedings of the 6th annual IEEE/ACM international symposium on Code generation and optimization
Kicking the tires of software transactional memory: why the going gets tough
Proceedings of the twentieth annual symposium on Parallelism in algorithms and architectures
Proceedings of the 14th international conference on Architectural support for programming languages and operating systems
IEEE Transactions on Parallel and Distributed Systems
Global instruction scheduling for multi-threaded architectures
Global instruction scheduling for multi-threaded architectures
The velocity compiler: extracting efficient multicore execution from legacy sequential codes
The velocity compiler: extracting efficient multicore execution from legacy sequential codes
A type and effect system for deterministic parallel Java
Proceedings of the 24th ACM SIGPLAN conference on Object oriented programming systems languages and applications
Proceedings of the 37th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages
The Paralax infrastructure: automatic parallelization with a helping hand
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Deadlock-Free channels and locks
ESOP'10 Proceedings of the 19th European conference on Programming Languages and Systems
Specification-based sketching with Sketch
Proceedings of the 13th Workshop on Formal Techniques for Java-Like Programs
Internally deterministic parallel algorithms can be fast
Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming
Parcae: a system for flexible parallel execution
Proceedings of the 33rd ACM SIGPLAN conference on Programming Language Design and Implementation
From sequential programming to flexible parallel execution
Proceedings of the 2012 international conference on Compilers, architectures and synthesis for embedded systems
General data structure expansion for multi-threading
Proceedings of the 34th ACM SIGPLAN conference on Programming language design and implementation
Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles
ACM SIGOPS 24th Symposium on Operating Systems Principles
The scalable commutativity rule: designing scalable software for multicore processors
Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles
Hi-index | 0.00 |
Sequential programming models express a total program order, of which a partial order must be respected. This inhibits parallelizing tools from extracting scalable performance. Programmer written semantic commutativity assertions provide a natural way of relaxing this partial order, thereby exposing parallelism implicitly in a program. Existing implicit parallel programming models based on semantic commutativity either require additional programming extensions, or have limited expressiveness. This paper presents a generalized semantic commutativity based programming extension, called Commutative Set (COMMSET), and associated compiler technology that enables multiple forms of parallelism. COMMSET expressions are syntactically succinct and enable the programmer to specify commutativity relations between groups of arbitrary structured code blocks. Using only this construct, serializing constraints that inhibit parallelization can be relaxed, independent of any particular parallelization strategy or concurrency control mechanism. COMMSET enables well performing parallelizations in cases where they were inapplicable or non-performing before. By extending eight sequential programs with only 8 annotations per program on average, COMMSET and the associated compiler technology produced a geomean speedup of 5.7x on eight cores compared to 1.5x for the best non-COMMSET parallelization.