Proceedings of the 6th annual IEEE/ACM international symposium on Code generation and optimization
Parallel-stage decoupled software pipelining
Proceedings of the 6th annual IEEE/ACM international symposium on Code generation and optimization
MAPS: an integrated framework for MPSoC application parallelization
Proceedings of the 45th annual Design Automation Conference
SoC-C: efficient programming abstractions for heterogeneous multicore systems on chip
CASES '08 Proceedings of the 2008 international conference on Compilers, architectures and synthesis for embedded systems
Proceedings of the 4th workshop on Declarative aspects of multicore programming
Proceedings of the 14th international conference on Architectural support for programming languages and operating systems
Copy or Discard execution model for speculative parallelization on multicores
Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
ARCS '09 Proceedings of the 22nd International Conference on Architecture of Computing Systems
Proceedings of the 2009 ACM SIGPLAN conference on Programming language design and implementation
Proceedings of the 2009 ACM SIGPLAN conference on Programming language design and implementation
Alchemist: A Transparent Dependence Distance Profiling Infrastructure
Proceedings of the 7th annual IEEE/ACM International Symposium on Code Generation and Optimization
Multi-execution: multicore caching for data-similar executions
Proceedings of the 36th annual international symposium on Computer architecture
SPARTAN: A software tool for Parallelization Bottleneck Analysis
IWMSE '09 Proceedings of the 2009 ICSE Workshop on Multicore Software Engineering
Speculative parallelization using software multi-threaded transactions
Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Speculative parallelization of sequential loops on multicores
International Journal of Parallel Programming
Exposing parallelism and locality in a runtime parallel optimization framework
Proceedings of the 7th ACM international conference on Computing frontiers
A profile-based tool for finding pipeline parallelism in sequential programs
Parallel Computing
The Paralax infrastructure: automatic parallelization with a helping hand
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Lime: a Java-compatible and synthesizable language for heterogeneous architectures
Proceedings of the ACM international conference on Object oriented programming systems languages and applications
Concurrent separation logic for pipelined parallelization
SAS'10 Proceedings of the 17th international conference on Static analysis
Scalable Speculative Parallelization on Commodity Clusters
MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Parallelism and data movement characterization of contemporary application classes
Proceedings of the twenty-third annual ACM symposium on Parallelism in algorithms and architectures
Commutative set: a language extension for implicit parallel programming
Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation
Parallelism orchestration using DoPE: the degree of parallelism executive
Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation
ALTER: exploiting breakable dependences for parallelization
Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation
Parallel programming of general-purpose programs using task-based programming models
HotPar'11 Proceedings of the 3rd USENIX conference on Hot topic in parallelism
Localizing globals and statics to make C programs thread-safe
CASES '11 Proceedings of the 14th international conference on Compilers, architectures and synthesis for embedded systems
Proceedings of the International Conference on Computer-Aided Design
Dataflow execution of sequential imperative programs on multicore architectures
Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Dynamic trace-based analysis of vectorization potential of applications
Proceedings of the 33rd ACM SIGPLAN conference on Programming Language Design and Implementation
Fast loop-level data dependence profiling
Proceedings of the 26th ACM international conference on Supercomputing
Multi-slicing: a compiler-supported parallel approach to data dependence profiling
Proceedings of the 2012 International Symposium on Software Testing and Analysis
Fast on-line statistical learning on a GPGPU
AusPDC '11 Proceedings of the Ninth Australasian Symposium on Parallel and Distributed Computing - Volume 118
Parallelizing Sequential Programs with Statistical Accuracy Tests
ACM Transactions on Embedded Computing Systems (TECS) - Special Section on Probabilistic Embedded Computing
Beyond reuse distance analysis: Dynamic analysis for characterization of data locality potential
ACM Transactions on Architecture and Code Optimization (TACO)
Integrating profile-driven parallelism detection and machine-learning-based mapping
ACM Transactions on Architecture and Code Optimization (TACO)
Hi-index | 0.00 |
Single-threaded programming is already considered a complicated task. The move to multi-threaded programming only increases the complexity and cost involved in software development due to rewriting legacy code, training of the programmer, increased debugging of the program, and ef- forts to avoid race conditions, deadlocks, and other prob- lems associated with parallel programming. To address these costs, other approaches, such as automatic thread ex- traction, have been explored. Unfortunately, the amount of parallelism that has been automatically extracted is gener- ally insufficient to keep many cores busy. This paper argues that this lack of parallelism is not an intrinsic limitation of the sequential programming model, but rather occurs for two reasons. First, there exists no framework for automatic thread extraction that brings to- gether key existing state-of-the-art compiler and hardware techniques. This paper shows that such a framework can yield scalable parallelization on several SPEC CINT2000 benchmarks. Second, existing sequential programming lan- guages force programmers to define a single legal program outcome, rather than allowing for a range of legal out- comes. This paper shows that natural extensions to the se- quential programming model enable parallelization for the remainder of the SPEC CINT2000 suite. Our experience demonstrates that, by changing only 60 source code lines, all of the C benchmarks in the SPEC CINT2000 suite were parallelizable by automatic thread extraction. This process, constrained by the limits of modern optimizing compilers, yielded a speedup of 454% on these applications.