AlphaSort: a RISC machine sort
SIGMOD '94 Proceedings of the 1994 ACM SIGMOD international conference on Management of data
Optimizing matrix multiply using PHiPAC: a portable, high-performance, ANSI C coding methodology
ICS '97 Proceedings of the 11th international conference on Supercomputing
Computer architecture (2nd ed.): a quantitative approach
Computer architecture (2nd ed.): a quantitative approach
The art of computer programming, volume 3: (2nd ed.) sorting and searching
The art of computer programming, volume 3: (2nd ed.) sorting and searching
A fast Fourier transform compiler
Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
Optimizing for reduced code space using genetic algorithms
Proceedings of the ACM SIGPLAN 1999 workshop on Languages, compilers, and tools for embedded systems
Implementing Quicksort programs
Communications of the ACM
SPL: a language and compiler for DSP algorithms
Proceedings of the ACM SIGPLAN 2001 conference on Programming language design and implementation
Genetic Algorithms in Search, Optimization and Machine Learning
Genetic Algorithms in Search, Optimization and Machine Learning
Stochastic search for signal processing algorithm optimization
Proceedings of the 2001 ACM/IEEE conference on Supercomputing
Cache Conscious Algorithms for Relational Query Processing
VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Finding effective optimization phase sequences
Proceedings of the 2003 ACM SIGPLAN conference on Language, compiler, and tool for embedded systems
Meta optimization: improving compiler heuristics with machine learning
PLDI '03 Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation
A Dynamically Tuned Sorting Library
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Evolutionary Computation
Classifier fitness based on accuracy
Evolutionary Computation
Automatic Tuning Matrix Multiplication Performance on Graphics Hardware
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Optimizing data permutations for SIMD devices
Proceedings of the 2006 ACM SIGPLAN conference on Programming language design and implementation
Online performance auditing: using hot optimizations without getting burned
Proceedings of the 2006 ACM SIGPLAN conference on Programming language design and implementation
Using SIMD registers and instructions to enable instruction-level parallelism in sorting algorithms
Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures
Adaptive Strassen's matrix multiplication
Proceedings of the 21st annual international conference on Supercomputing
An experimental study of sorting and branch prediction
Journal of Experimental Algorithmics (JEA)
Adaptive Winograd's matrix multiplications
ACM Transactions on Mathematical Software (TOMS)
PetaBricks: a language and compiler for algorithmic choice
Proceedings of the 2009 ACM SIGPLAN conference on Programming language design and implementation
Automated tuning in parallel sorting on multi-core architectures
EuroPar'10 Proceedings of the 16th international Euro-Par conference on Parallel processing: Part I
Communications of the ACM
Optimizing matrix multiplication with a classifier learning system
LCPC'05 Proceedings of the 18th international conference on Languages and Compilers for Parallel Computing
A transactional memory with automatic performance tuning
ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
Mitigating the compiler optimization phase-ordering problem using machine learning
Proceedings of the ACM international conference on Object oriented programming systems languages and applications
Hi-index | 0.02 |
The growing complexity of modern processors has made the generation of highly efficient code increasingly difficult. Manual code generation is very time consuming, but it is often the only choice since the code generated by today's compiler technology often has much lower performance than the best hand-tuned codes. A promising code generation strategy, implemented by systems like ATLAS, FFTW, and SPIRAL, uses empirical search to find the parameter values of the implementation, such as the tile size and instruction schedules, that deliver near-optimal performance for a particular machine. However, this approach has only proven successful on scientific codes whose performance does not depend on the input data. In this paper we study machine learning techniques to extend empirical search to the generation of sorting routines, whose performance depends on the input characteristics and the architecture of the target machine. We build on a previous study that selects a "pure" sorting algorithm at the outset of the computation as a function of the standard deviation. The approach discussed in this paper uses genetic algorithms and a classifier system to build hierarchically-organized hybrid sorting algorithms capable of adapting to the input data. Our results show that such algorithms generated using the approach presented in this paper are quite effective at taking into account the complex interactions between architectural and input data characteristics and that the resulting code performs significantly better than conventional sorting implementations and the code generated by our earlier study. In particular, the routines generated using our approach perform better than all the commercial libraries that we tried including IBM ESSL, INTEL MKL and the C++ STL. The best algorithm we have been able to generate is on the average 26%and 62% faster than the IBM ESSL in an IBM Power 3 and IBM Power 4, respectively.