Run-Time Parallelization and Scheduling of Loops
IEEE Transactions on Computers
Limits of control flow on parallelism
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Transactional memory: architectural support for lock-free data structures
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Proceedings of the fourteenth annual ACM symposium on Principles of distributed computing
ICS '98 Proceedings of the 12th international conference on Supercomputing
Task selection for a multiscalar processor
MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Data speculation support for a chip multiprocessor
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
TEST: a tracer for extracting speculative threads
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
The Potential for Using Thread-Level Data Speculation to Facilitate Automatic Parallelization
HPCA '98 Proceedings of the 4th International Symposium on High-Performance Computer Architecture
In Search of Speculative Thread-Level Parallelism
PACT '99 Proceedings of the 1999 International Conference on Parallel Architectures and Compilation Techniques
The Jrpm system for dynamically parallelizing Java programs
Proceedings of the 30th annual international symposium on Computer architecture
Language support for lightweight transactions
OOPSLA '03 Proceedings of the 18th annual ACM SIGPLAN conference on Object-oriented programing, systems, languages, and applications
Compiler Optimization of Memory-Resident Value Communication Between Speculative Threads
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Pin: building customized program analysis tools with dynamic instrumentation
Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
POSH: a TLS compiler that exploits program structure
Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming
McRT-STM: a high performance software transactional memory system for a multi-core runtime
Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Implicit parallelism with ordered transactions
Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming
An effective hybrid transactional memory system with strong isolation guarantees
Proceedings of the 34th annual international symposium on Computer architecture
Performance pathologies in hardware transactional memory
Proceedings of the 34th annual international symposium on Computer architecture
Hardware atomicity for reliable software speculation
Proceedings of the 34th annual international symposium on Computer architecture
Optimistic parallelism requires abstractions
Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation
Understanding Tradeoffs in Software Transactional Memory
Proceedings of the International Symposium on Code Generation and Optimization
Design and implementation of the HPCS graph analysis benchmark on symmetric multiprocessors
HiPC'05 Proceedings of the 12th international conference on High Performance Computing
Compiler-Driven Dependence Profiling to Guide Program Parallelization
Languages and Compilers for Parallel Computing
How much parallelism is there in irregular applications?
Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
An efficient transactional memory algorithm for computing minimum spanning forest of sparse graphs
Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
Parallelization spectroscopy: analysis of thread-level parallelism in hpc programs
Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
An analytic framework for performance modeling of software transactional memory
Computer Networks: The International Journal of Computer and Telecommunications Networking
Modeling software transactional memory with AnyLogic
Proceedings of the 2nd International Conference on Simulation Tools and Techniques
Alchemist: A Transparent Dependence Distance Profiling Infrastructure
Proceedings of the 7th annual IEEE/ACM International Symposium on Code Generation and Optimization
An Analytic Model for Optimistic STM with Lazy Locking
ASMTA '09 Proceedings of the 16th International Conference on Analytical and Stochastic Modeling Techniques and Applications
Modeling transactional memory workload performance
Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
Exploring the limits of disjoint access parallelism
HotPar'09 Proceedings of the First USENIX conference on Hot topics in parallelism
SD3: A Scalable Approach to Dynamic Data-Dependence Profiling
MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Hybrid binary rewriting for memory access instrumentation
Proceedings of the 7th ACM SIGPLAN/SIGOPS international conference on Virtual execution environments
Robust adaptation to available parallelism in transactional memory applications
Transactions on high-performance embedded architectures and compilers III
Kremlin: rethinking and rebooting gprof for the multicore age
Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation
QUAD: a memory access pattern analyser
ARC'10 Proceedings of the 6th international conference on Reconfigurable Computing: architectures, Tools and Applications
Profiling Data-Dependence to Assist Parallelization: Framework, Scope, and Optimization
MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
Online dynamic dependence analysis for speculative polyhedral parallelization
Euro-Par'13 Proceedings of the 19th international conference on Parallel Processing
Hi-index | 0.00 |
This work presents a quantitative approach to analyze parallelization opportunities in programs with irregular memory access where potential data dependencies mask available parallelism. The model captures data and causal dependencies among critical sections as algorithmic properties and quantifies them as a density computed over the number of executed instructions. The model abstracts from runtime aspects such as scheduling, the number of threads, and concurrency control used in a particular parallelization. We illustrate the model on several applications requiring ordered and unordered execution of critical sections. We describe a run-time tool that computes the dependence densities from a deterministic single-threaded program execution. This density metric provides insights into the potential for optimistic parallelization, opportunities for algorithmic scheduling, and performance defects due to synchronization bottlenecks. Based on the results of our analysis, we classify applications into three categories with low, medium, and high dependence densities. Applications with low dependence density are naturally good candidates for optimistic concurrency, applications with medium density may require a scheduler that is aware of the algorithmic dependencies for optimistic concurrency to be effective, and applications with high dependence density may not be suitable for parallelization.