Implementation and tests of low-discrepancy sequences
ACM Transactions on Modeling and Computer Simulation (TOMACS)
Neural Networks for Pattern Recognition
Neural Networks for Pattern Recognition
Neural Networks: Tricks of the Trade, this book is an outgrowth of a 1996 NIPS workshop
A fast learning algorithm for deep belief nets
Neural Computation
Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning)
Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning)
Quasi-Monte Carlo strategies for stochastic optimization
Proceedings of the 38th conference on Winter simulation
An empirical evaluation of deep architectures on problems with many factors of variation
Proceedings of the 24th international conference on Machine learning
Extracting and composing robust features with denoising autoencoders
Proceedings of the 25th international conference on Machine learning
GNU Scientific Library Reference Manual - Third Edition
GNU Scientific Library Reference Manual - Third Edition
Learning Deep Architectures for AI
Foundations and Trends® in Machine Learning
Why Does Unsupervised Pre-training Help Deep Learning?
The Journal of Machine Learning Research
Parameter Screening and Optimisation for ILP using Designed Experiments
The Journal of Machine Learning Research
Sequential model-based optimization for general algorithm configuration
LION'05 Proceedings of the 5th international conference on Learning and Intelligent Optimization
Scalable relation prediction exploiting both intrarelational correlation and contextual information
ECML PKDD'12 Proceedings of the 2012 European conference on Machine Learning and Knowledge Discovery in Databases - Volume Part I
Learning two-layer contractive encodings
ICANN'12 Proceedings of the 22nd international conference on Artificial Neural Networks and Machine Learning - Volume Part I
Source code author identification with unsupervised feature learning
Pattern Recognition Letters
Enhanced gradient for training restricted boltzmann machines
Neural Computation
Auto-WEKA: combined selection and hyperparameter optimization of classification algorithms
Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
PredictionIO: a distributed machine learning server for practical software development
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Combining modality specific deep neural networks for emotion recognition in video
Proceedings of the 15th ACM on International conference on multimodal interaction
Bayesian optimization in high dimensions via random embeddings
IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Lazy paired hyper-parameter tuning
IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Algorithm runtime prediction: Methods & evaluation
Artificial Intelligence
Techniques for scalable and effective routability evaluation
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Hi-index | 0.00 |
Grid search and manual search are the most widely used strategies for hyper-parameter optimization. This paper shows empirically and theoretically that randomly chosen trials are more efficient for hyper-parameter optimization than trials on a grid. Empirical evidence comes from a comparison with a large previous study that used grid search and manual search to configure neural networks and deep belief networks. Compared with neural networks configured by a pure grid search, we find that random search over the same domain is able to find models that are as good or better within a small fraction of the computation time. Granting random search the same computational budget, random search finds better models by effectively searching a larger, less promising configuration space. Compared with deep belief networks configured by a thoughtful combination of manual search and grid search, purely random search over the same 32-dimensional configuration space found statistically equal performance on four of seven data sets, and superior performance on one of seven. A Gaussian process analysis of the function from hyper-parameters to validation set performance reveals that for most data sets only a few of the hyper-parameters really matter, but that different hyper-parameters are important on different data sets. This phenomenon makes grid search a poor choice for configuring algorithms for new data sets. Our analysis casts some light on why recent "High Throughput" methods achieve surprising success--they appear to search through a large number of hyper-parameters because most hyper-parameters do not matter much. We anticipate that growing interest in large hierarchical models will place an increasing burden on techniques for hyper-parameter optimization; this work shows that random search is a natural baseline against which to judge progress in the development of adaptive (sequential) hyper-parameter optimization algorithms.