Random search for hyper-parameter optimization

Authors:
James Bergstra;Yoshua Bengio
Affiliations:
Département d'Informatique et de recherche opérationnelle, Université de Montréal, Montréal, QC, Canada;Département d'Informatique et de recherche opérationnelle, Université de Montréal, Montréal, QC, Canada
Venue:
The Journal of Machine Learning Research
Year:
2012

Citing 16
Cited 11

Implementation and tests of low-discrepancy sequences

ACM Transactions on Modeling and Computer Simulation (TOMACS)
Simulation-Based Optimization with Stochastic Approximation Using Common Random Numbers

Management Science
Neural Networks for Pattern Recognition

Neural Networks for Pattern Recognition
Effiicient BackProp

Neural Networks: Tricks of the Trade, this book is an outgrowth of a 1996 NIPS workshop
Reducing the time complexity of the derandomized evolution strategy with covariance matrix adaptation (CMA-ES)

Evolutionary Computation
Choosing search heuristics by non-stationary reinforcement learning

Metaheuristics
A fast learning algorithm for deep belief nets

Neural Computation
Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning)

Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning)
Quasi-Monte Carlo strategies for stochastic optimization

Proceedings of the 38th conference on Winter simulation
An empirical evaluation of deep architectures on problems with many factors of variation

Proceedings of the 24th international conference on Machine learning
Extracting and composing robust features with denoising autoencoders

Proceedings of the 25th international conference on Machine learning
GNU Scientific Library Reference Manual - Third Edition

GNU Scientific Library Reference Manual - Third Edition
Learning Deep Architectures for AI

Foundations and Trends® in Machine Learning
Why Does Unsupervised Pre-training Help Deep Learning?

The Journal of Machine Learning Research
Parameter Screening and Optimisation for ILP using Designed Experiments

The Journal of Machine Learning Research
Sequential model-based optimization for general algorithm configuration

LION'05 Proceedings of the 5th international conference on Learning and Intelligent Optimization

Scalable relation prediction exploiting both intrarelational correlation and contextual information

ECML PKDD'12 Proceedings of the 2012 European conference on Machine Learning and Knowledge Discovery in Databases - Volume Part I
Learning two-layer contractive encodings

ICANN'12 Proceedings of the 22nd international conference on Artificial Neural Networks and Machine Learning - Volume Part I
Source code author identification with unsupervised feature learning

Pattern Recognition Letters
Enhanced gradient for training restricted boltzmann machines

Neural Computation
Auto-WEKA: combined selection and hyperparameter optimization of classification algorithms

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
PredictionIO: a distributed machine learning server for practical software development

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Combining modality specific deep neural networks for emotion recognition in video

Proceedings of the 15th ACM on International conference on multimodal interaction
Bayesian optimization in high dimensions via random embeddings

IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Lazy paired hyper-parameter tuning

IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Algorithm runtime prediction: Methods & evaluation

Artificial Intelligence
Techniques for scalable and effective routability evaluation

ACM Transactions on Design Automation of Electronic Systems (TODAES)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Grid search and manual search are the most widely used strategies for hyper-parameter optimization. This paper shows empirically and theoretically that randomly chosen trials are more efficient for hyper-parameter optimization than trials on a grid. Empirical evidence comes from a comparison with a large previous study that used grid search and manual search to configure neural networks and deep belief networks. Compared with neural networks configured by a pure grid search, we find that random search over the same domain is able to find models that are as good or better within a small fraction of the computation time. Granting random search the same computational budget, random search finds better models by effectively searching a larger, less promising configuration space. Compared with deep belief networks configured by a thoughtful combination of manual search and grid search, purely random search over the same 32-dimensional configuration space found statistically equal performance on four of seven data sets, and superior performance on one of seven. A Gaussian process analysis of the function from hyper-parameters to validation set performance reveals that for most data sets only a few of the hyper-parameters really matter, but that different hyper-parameters are important on different data sets. This phenomenon makes grid search a poor choice for configuring algorithms for new data sets. Our analysis casts some light on why recent "High Throughput" methods achieve surprising success--they appear to search through a large number of hyper-parameters because most hyper-parameters do not matter much. We anticipate that growing interest in large hierarchical models will place an increasing burden on techniques for hyper-parameter optimization; this work shows that random search is a natural baseline against which to judge progress in the development of adaptive (sequential) hyper-parameter optimization algorithms.