Multi-Task Bayesian OptimizationMulti-Task Bayesian OptimizationSwersky, Kevin and Snoek, Jasper and Adams, Ryan Prescott2013

Paper summarynipsreviewsThis paper presents a multi-task Bayesian optimization approach to hyper-parameter setting in machine learning models. In particular, it leverages previous work on multi-task GP learning with decomposable covariance functions and Bayesian optimization of expensive cost functions. Previous work has shown that decomposable covariance functions can be useful in multi-task regression problems (e.g. \cite{conf/nips/BonillaCW07}) and that Bayesian optimization based on response-surfaces can also be useful for hyper-parameter tuning of machine learning algorithms \cite{conf/nips/SnoekLA12} \cite{conf/icml/BergstraYC13}.
The paper combines the decomposable covariance assumption \cite{conf/nips/BonillaCW07} and Bayesian optimization based on expected improvement \cite{journals/jgo/Jones01} and entropy search \cite{conf/icml/BergstraYC13} to show empirically that it is possible to :
1. Transfer optimization knowledge across related problems, addressing e.g. the cold-start problem
2. Optimize an aggregate of different objective functions with applications to speeding-up cross validation
3. Use information from a smaller problem to help optimize a bigger problem faster
Positive experimental results are shown on synthetic data (Branin-Hoo function), optimizing logistic regression hyper-parameters and optimizing hyper-parameters of online LDA on real data.

This paper presents a multi-task Bayesian optimization approach to hyper-parameter setting in machine learning models. In particular, it leverages previous work on multi-task GP learning with decomposable covariance functions and Bayesian optimization of expensive cost functions. Previous work has shown that decomposable covariance functions can be useful in multi-task regression problems (e.g. \cite{conf/nips/BonillaCW07}) and that Bayesian optimization based on response-surfaces can also be useful for hyper-parameter tuning of machine learning algorithms \cite{conf/nips/SnoekLA12} \cite{conf/icml/BergstraYC13}.
The paper combines the decomposable covariance assumption \cite{conf/nips/BonillaCW07} and Bayesian optimization based on expected improvement \cite{journals/jgo/Jones01} and entropy search \cite{conf/icml/BergstraYC13} to show empirically that it is possible to :
1. Transfer optimization knowledge across related problems, addressing e.g. the cold-start problem
2. Optimize an aggregate of different objective functions with applications to speeding-up cross validation
3. Use information from a smaller problem to help optimize a bigger problem faster
Positive experimental results are shown on synthetic data (Branin-Hoo function), optimizing logistic regression hyper-parameters and optimizing hyper-parameters of online LDA on real data.