Discrete cosine transform: algorithms, advantages, applications
Discrete cosine transform: algorithms, advantages, applications
High-level optimization via automated statistical modeling
PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Stochastic search for signal processing algorithm optimization
Proceedings of the 2001 ACM/IEEE conference on Supercomputing
Learning to Predict Performance from Formula Modeling and Training Data
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Algorithm Selection using Reinforcement Learning
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Spiral: A Generator for Platform-Adapted Libraries of Signal Processing Algorithms
International Journal of High Performance Computing Applications
In search of the optimal Walsh-Hadamard transform
ICASSP '00 Proceedings of the Acoustics, Speech, and Signal Processing, 2000. on IEEE International Conference - Volume 06
IEEE Transactions on Parallel and Distributed Systems
Mapping of Discrete Cosine Transforms onto Distributed Hardware Architectures
Journal of Signal Processing Systems
Bandit-based optimization on graphs with application to library performance tuning
ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Automatic performance programming
Proceedings of the 10th SIGPLAN symposium on New ideas, new paradigms, and reflections on programming and software
PEPM '12 Proceedings of the ACM SIGPLAN 2012 workshop on Partial evaluation and program manipulation
An FFT performance model for optimizing general-purpose processor architecture
Journal of Computer Science and Technology - Special issue on Community Analysis and Information Recommendation
Hi-index | 0.00 |
A single signal processing algorithm can be represented by many mathematically equivalent formulas. However, when these formulas are implemented in code and run on real machines, they have very different runtimes. Unfortunately, it is extremely difficult to model this broad performance range. Further, the space of formulas for real signal transforms is so large that it is impossible to search it exhaustively for fast implementations. We approach this search question as a control learning problem. We present a new method for learning to generate fast formulas, allowing us to intelligently search through only the most promising formulas. Our approach incorporates signal processing knowledge, hardware features, and formula performance data to learn to construct fast formulas. Our method learns from performance data for a few formulas of one size and then can construct formulas that will have the fastest runtimes possible across many sizes.