A logarithmic time sort for linear size networks
Journal of the ACM (JACM)
Solving problems on concurrent processors. Vol. 1: General techniques and regular problems
Solving problems on concurrent processors. Vol. 1: General techniques and regular problems
An overview for the PTRAN analysis system for multiprocessing
Journal of Parallel and Distributed Computing - Special Issue on Languages, Compilers and environments for Parallel Programming
Congestion avoidance and control
SIGCOMM '88 Symposium proceedings on Communications architectures and protocols
The definition of Standard ML
Systems programming with Modula-3
Systems programming with Modula-3
Active messages: a mechanism for integrated communication and computation
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
The network architecture of the Connection Machine CM-5 (extended abstract)
SPAA '92 Proceedings of the fourth annual ACM symposium on Parallel algorithms and architectures
Profile-driven compilation
Model-driven mapping of computation onto distributed memory parallel computers
Model-driven mapping of computation onto distributed memory parallel computers
Model-driven mapping onto distributed memory parallel computers
Proceedings of the 1992 ACM/IEEE conference on Supercomputing
LogP: towards a realistic model of parallel computation
PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
Programming models for irregular applications
ACM SIGPLAN Notices - Workshop on languages, compilers and run-time environments for distributed memory multiprocessors
Anatomy of a message in the Alewife multiprocessor
ICS '93 Proceedings of the 7th international conference on Supercomputing
Portable high-performance supercomputing: high-level platform-dependent optimization
Portable high-performance supercomputing: high-level platform-dependent optimization
Parallel performance prediction using lost cycles analysis
Proceedings of the 1994 ACM/IEEE conference on Supercomputing
How to Get Good Performance from the CM-5 Data Network
Proceedings of the 8th International Symposium on Parallel Processing
PRELUDE: A SYSTEM FOR PORTABLE PARALL
PRELUDE: A SYSTEM FOR PORTABLE PARALL
Adapting to network and client variability via on-demand dynamic distillation
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Programming language requirements for the next millennium
ACM Computing Surveys (CSUR) - Special issue: position statements on strategic directions in computing research
Dynamic feedback: an effective technique for adaptive computing
Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
Eliminating synchronization overhead in automatically parallelized programs using dynamic feedback
ACM Transactions on Computer Systems (TOCS)
SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
Application-level scheduling on distributed heterogeneous networks
Supercomputing '96 Proceedings of the 1996 ACM/IEEE conference on Supercomputing
Stochastic search for signal processing algorithm optimization
Proceedings of the 2001 ACM/IEEE conference on Supercomputing
Performance Modeling and Composition: A Case Study in Cell Simulation
IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
A Performance Estimator for Parallel Programs
Euro-Par '99 Proceedings of the 5th International Euro-Par Conference on Parallel Processing
Performance Prediction with Benchmaps
IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
Scheduling From the Perspective of the Application
HPDC '96 Proceedings of the 5th IEEE International Symposium on High Performance Distributed Computing
Predicting the Running Times of Parallel Programs by Simulation
IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
Learning to construct fast signal processing implementations
The Journal of Machine Learning Research
Parallel program performance prediction using deterministic task graph analysis
ACM Transactions on Computer Systems (TOCS)
Architecture of an automatically tuned linear algebra library
Parallel Computing
A framework for adaptive algorithm selection in STAPL
Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming
Ligature: Component Architecture for High Performance Applications
International Journal of High Performance Computing Applications
Statistical Models for Empirical Search-Based Performance Tuning
International Journal of High Performance Computing Applications
SmartApps: middle-ware for adaptive applications on reconfigurable platforms
ACM SIGOPS Operating Systems Review
An Adaptive Algorithm Selection Framework for Reduction Parallelization
IEEE Transactions on Parallel and Distributed Systems
Measuring empirical computational complexity
Proceedings of the the 6th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineering
Automatic Communication Performance Debugging in PGAS Languages
Languages and Compilers for Parallel Computing
Empirical hardness models: Methodology and a case study on combinatorial auctions
Journal of the ACM (JACM)
PetaBricks: a language and compiler for algorithmic choice
Proceedings of the 2009 ACM SIGPLAN conference on Programming language design and implementation
Performance modeling for dynamic algorithm selection
ICCS'03 Proceedings of the 2003 international conference on Computational science
Self-adapting numerical software and automatic tuning of heuristics
ICCS'03 Proceedings of the 2003 international conference on Computational science
Self-adapting numerical software and automatic tuning of heuristics
ICCS'03 Proceedings of the 2003 international conference on Computational science
Practical performance models of algorithms in evolutionary program induction and other domains
Artificial Intelligence
A case for machine learning to optimize multicore performance
HotPar'09 Proceedings of the First USENIX conference on Hot topics in parallelism
Comparing machine learning approaches for context-aware composition
SC'11 Proceedings of the 10th international conference on Software composition
Optimizing matrix multiplication with a classifier learning system
LCPC'05 Proceedings of the 18th international conference on Languages and Compilers for Parallel Computing
Language and compiler support for auto-tuning variable-accuracy algorithms
CGO '11 Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization
Optimized composition of performance-aware parallel components
Concurrency and Computation: Practice & Experience
An automated approach to generating efficient constraint solvers
Proceedings of the 34th International Conference on Software Engineering
Adaptation of legacy codes to context-aware composition using aspect-oriented programming
SC'12 Proceedings of the 11th international conference on Software Composition
Mantis: automatic performance prediction for smartphone applications
USENIX ATC'13 Proceedings of the 2013 USENIX conference on Annual Technical Conference
Algorithm runtime prediction: Methods & evaluation
Artificial Intelligence
Hi-index | 0.00 |
We develop the use of statistical modeling for portable high-level optimizations such as data layout and algorithm selection. We build the models automatically from profiling information, which ensures robust and accurate models that reflect all aspects of the target platform.We use the models to select among several data layouts for an iterative PDE solver and to select among several sorting algorithms. The selection is correct more than 99% of the time on each of four platforms. In the few cases it selects suboptimally, the selected implementation performs nearly as well; that is, it always makes at least a very good choice. Correct selection is platform and workload dependent and can improve performance by nearly a factor of three.We also use the models to optimize parameters of these applications automatically. In all cases, the models predicted the optimal parameter setting, resulting in improvements ranging up to a factor of three.Finally, we use the models to construct portable high-level libraries, which contain multiple implementations and support for automatic selection and parameter optimization of the fastest implementation for the target platform and workload.