Measuring Parallelism in Computation-Intensive Scientific/Engineering Applications
IEEE Transactions on Computers
A framework for determining useful parallelism
ICS '88 Proceedings of the 2nd international conference on Supercomputing
Quartz: a tool for tuning parallel program performance
SIGMETRICS '90 Proceedings of the 1990 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Dynamic dependency analysis of ordinary programs
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Space-time scheduling of instruction-level parallelism on a raw machine
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
SUIF Explorer: an interactive and interprocedural parallelizer
Proceedings of the seventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Timestamped whole program path representation and its applications
Proceedings of the ACM SIGPLAN 2001 conference on Programming language design and implementation
A microbenchmark suite for OpenMP 2.0
ACM SIGARCH Computer Architecture News - Special Issue: PACT 2001 workshops
Parallel Programming with Polaris
Computer
Interactive Parallel Programming using the ParaScope Editor
IEEE Transactions on Parallel and Distributed Systems
Loop-Level Parallelism in Numeric and Symbolic Programs
IEEE Transactions on Parallel and Distributed Systems
The Jrpm system for dynamically parallelizing Java programs
Proceedings of the 30th annual international symposium on Computer architecture
LLVM: A Compilation Framework for Lifelong Program Analysis & Transformation
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
A cost-driven compilation framework for speculative parallelization of sequential programs
Proceedings of the ACM SIGPLAN 2004 conference on Programming language design and implementation
POSH: a TLS compiler that exploits program structure
Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming
X10: concurrent programming for modern architectures
Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming
Valgrind: a framework for heavyweight dynamic binary instrumentation
Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation
Efficient online detection of dynamic control dependence
Proceedings of the 2007 international symposium on Software testing and analysis
IEEE Transactions on Computers
Modeling optimistic concurrency using quantitative dependence analysis
Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
Compiler-Driven Dependence Profiling to Guide Program Parallelization
Languages and Compilers for Parallel Computing
How much parallelism is there in irregular applications?
Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
Effective performance measurement and analysis of multithreaded applications
Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
Proceedings of the 2009 ACM SIGPLAN conference on Programming language design and implementation
Alchemist: A Transparent Dependence Distance Profiling Infrastructure
Proceedings of the 7th annual IEEE/ACM International Symposium on Code Generation and Optimization
Fast Track: A Software System for Speculative Program Optimization
Proceedings of the 7th annual IEEE/ACM International Symposium on Code Generation and Optimization
Refactoring sequential Java code for concurrency via concurrent libraries
ICSE '09 Proceedings of the 31st International Conference on Software Engineering
Profiling Java programs for parallelism
IWMSE '09 Proceedings of the 2009 ICSE Workshop on Multicore Software Engineering
Proceedings of the the 7th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineering
The Cilk++ concurrency platform
Proceedings of the 46th Annual Design Automation Conference
Manipulating lossless video in the compressed domain
MM '09 Proceedings of the 17th ACM international conference on Multimedia
SD-VBS: The San Diego Vision Benchmark Suite
IISWC '09 Proceedings of the 2009 IEEE International Symposium on Workload Characterization (IISWC)
Interfacing computer aided parallelization and performance analysis
ICCS'03 Proceedings of the 2003 international conference on Computational science
Umbra: efficient and scalable memory shadowing
Proceedings of the 8th annual IEEE/ACM international symposium on Code generation and optimization
The Cilkview scalability analyzer
Proceedings of the twenty-second annual ACM symposium on Parallelism in algorithms and architectures
Understanding parallelism-inhibiting dependences in sequential Java programs
ICSM '10 Proceedings of the 2010 IEEE International Conference on Software Maintenance
SD3: A Scalable Approach to Dynamic Data-Dependence Profiling
MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Kremlin: like gprof, but for parallelization
Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
Parkour: parallel speedup estimates for serial programs
HotPar'11 Proceedings of the 3rd USENIX conference on Hot topic in parallelism
Kismet: parallel speedup estimates for serial programs
Proceedings of the 2011 ACM international conference on Object oriented programming systems languages and applications
Dynamic trace-based analysis of vectorization potential of applications
Proceedings of the 33rd ACM SIGPLAN conference on Programming Language Design and Implementation
Fast loop-level data dependence profiling
Proceedings of the 26th ACM international conference on Supercomputing
Multi-slicing: a compiler-supported parallel approach to data dependence profiling
Proceedings of the 2012 International Symposium on Software Testing and Analysis
Harmony: collection and analysis of parallel block vectors
Proceedings of the 39th Annual International Symposium on Computer Architecture
Automatic parallelization using autofutures
MSEPT'12 Proceedings of the 2012 international conference on Multicore Software Engineering, Performance, and Tools
Optimization coaching: optimizers learn to communicate with programmers
Proceedings of the ACM international conference on Object oriented programming systems languages and applications
Profiling Data-Dependence to Assist Parallelization: Framework, Scope, and Optimization
MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
Bottle graphs: visualizing scalability bottlenecks in multi-threaded applications
Proceedings of the 2013 ACM SIGPLAN international conference on Object oriented programming systems languages & applications
Beyond reuse distance analysis: Dynamic analysis for characterization of data locality potential
ACM Transactions on Architecture and Code Optimization (TACO)
Integrating profile-driven parallelism detection and machine-learning-based mapping
ACM Transactions on Architecture and Code Optimization (TACO)
Hi-index | 0.00 |
Many recent parallelization tools lower the barrier for parallelizing a program, but overlook one of the first questions that a programmer needs to answer: which parts of the program should I spend time parallelizing? This paper examines Kremlin, an automatic tool that, given a serial version of a program, will make recommendations to the user as to what regions (e.g. loops or functions) of the program to attack first. Kremlin introduces a novel hierarchical critical path analysis and develops a new metric for estimating the potential of parallelizing a region: self-parallelism. We further introduce the concept of a parallelism planner, which provides a ranked order of specific regions to the programmer that are likely to have the largest performance impact when parallelized. Kremlin supports multiple planner personalities, which allow the planner to more effectively target a particular programming environment or class of machine. We demonstrate the effectiveness of one such personality, an OpenMP planner, by comparing versions of programs that are parallelized according to Kremlin's plan against third-party manually parallelized versions. The results show that Kremlin's OpenMP planner is highly effective, producing plans whose performance is typically comparable to, and sometimes much better than, manual parallelization. At the same time, these plans would require that the user parallelize significantly fewer regions of the program.