A framework for determining useful parallelism
ICS '88 Proceedings of the 2nd international conference on Supercomputing
Performance issues in non-blocking synchronization on shared-memory multiprocessors
PODC '92 Proceedings of the eleventh annual ACM symposium on Principles of distributed computing
Array privatization for parallel execution of loops
ICS '92 Proceedings of the 6th international conference on Supercomputing
Transactional memory: architectural support for lock-free data structures
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Proceedings of the fourteenth annual ACM symposium on Principles of distributed computing
Asynchronous Iterative Methods for Multiprocessors
Journal of the ACM (JACM)
Concurrent reading and writing
Communications of the ACM
Automatically tuned linear algebra software
SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
On Privatization of Variables for Data-Parallel Execution
IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
Embedded processor design challenges
Green: a framework for supporting energy-conscious programming using controlled approximation
PLDI '10 Proceedings of the 2010 ACM SIGPLAN conference on Programming language design and implementation
Patterns and statistical analysis for understanding reduced resource computing
Proceedings of the ACM international conference on Object oriented programming systems languages and applications
Randomized accuracy-aware program transformations for efficient approximate computations
POPL '12 Proceedings of the 39th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Proving acceptability properties of relaxed nondeterministic approximate programs
Proceedings of the 33rd ACM SIGPLAN conference on Programming Language Design and Implementation
Breaking the speed and scalability barriers for graph exploration on distributed-memory machines
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Paraprox: pattern-based approximation for data parallel applications
Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
Hi-index | 0.00 |
Synchronization overhead is a major bottleneck in scaling parallel applications to a large number of cores. This continues to be true in spite of various synchronization-reduction techniques that have been proposed. Previously studied synchronization-reduction techniques tacitly assume that all synchronizations specified in a source program are essential to guarantee quality of the results produced by the program. Recently there have been proposals to relax the synchronizations in a parallel program and compute approximate results. A fundamental challenge in using relaxed synchronization is guaranteeing that the relaxed program always produces results with a specified quality. We propose a methodology that addresses this challenge in programming with relaxed synchronization. Using our methodology programmers can systematically relax synchronization while always producing results that are of same quality as the original (un-relaxed) program. We demonstrate significant speedups using our methodology on a variety of benchmarks (e.g., up to 15x on KMeans benchmark, and up to 3x on a already highly tuned kernel from Graph500 benchmark).