Tuning Automation¶

To address the unfairness that arises from the tuning procedure, we implemented a tuning automation in DeepOBS. Here, we describe how to use it. We also provide some basic functionalities to monitor the tuning process. These are not explained here, but can be found in the API section of the Tuner. We further describe a comperative and fair usage of the tuning automation in the Suggested Protocol.

We provide three different Tuner classes: GridSearch, RandomSearch and GP (which is a Bayesian optimization method with a Gaussian Process surrogate). You can find detailed information about them in the API section Tuner. We will show all examples in this section for the PyTorch framework.

Grid Search¶

To perform an automated grid search you first have to create the Tuner instance. The optimizer class and its hyperparameters have to be specified in the same way like for Runners. Additionally, you have to give a dictionary that holds the discrete values of each hyperparameter. By default, calling tune will execute the whole tuning process in a sequential way on the given hardware.

If you want to parallelize the tuning process you can use the method generate_commands_script. It generates commands than can be send to different nodes. If the format of the command string is not correct for your training or hyper parameters you have to overwrite the methods _generate_kwargs_format_for_command_line and _generate_hyperparams_format_for_command_line of the ParallelizedTuner accordingly. Note that the generated commands refer to a run script that you have to specify on your own. Here, as an example, the generated commands refer to a standard SGD script

from deepobs.tuner import GridSearch
from torch.optim import SGD
import numpy as np
from deepobs.pytorch.runners import StandardRunner

optimizer_class = SGD
hyperparams = {"lr": {"type": float},
               "momentum": {"type": float},
               "nesterov": {"type": bool}}

# The discrete values to construct a grid for.
grid = {'lr': np.logspace(-5, 2, 6),
        'momentum': [0.5, 0.7, 0.9],
        'nesterov': [False, True]}

# Make sure to set the amount of ressources to the grid size. For grid search, this is just a sanity check.
tuner = GridSearch(optimizer_class, hyperparams, grid, runner=StandardRunner, ressources=6*3*2)

# Tune (i.e. evaluate every grid point) and rerun the best setting with 10 different seeds.
# tuner.tune('quadratic_deep', rerun_best_setting=True, num_epochs=2, output_dir='./grid_search')

# Optionally, generate commands for a parallelized execution
tuner.generate_commands_script('quadratic_deep', run_script = '../SGD.py', num_epochs =2, output_dir='./grid_search', generation_dir='./grid_search_commands')

You can download this example and use it as a template.

Random Search¶

For the random search, you have to give a dictionary that holds the distributions for each hyperparameter:

from deepobs.tuner import RandomSearch
from torch.optim import SGD
from deepobs.tuner.tuner_utils import log_uniform
from scipy.stats.distributions import uniform, binom
from deepobs import config
from deepobs.pytorch.runners import StandardRunner

optimizer_class = SGD
hyperparams = {"lr": {"type": float},
               "momentum": {"type": float},
               "nesterov": {"type": bool}}

# Define the distributions to sample from
distributions = {'lr': log_uniform(-5, 2),
        'momentum': uniform(0.5, 0.5),
        'nesterov': binom(1, 0.5)}

# Allow 36 random evaluations.
tuner = RandomSearch(optimizer_class, hyperparams, distributions, runner=StandardRunner,ressources=36)

# Tune (i.e. evaluate 36 different random samples) and rerun the best setting with 10 different seeds.
tuner.tune('quadratic_deep', rerun_best_setting=True, num_epochs=2, output_dir='./random_search')

# Optionally, generate commands for a parallelized execution
tuner.generate_commands_script('quadratic_deep', run_script='../SGD.py', num_epochs =2, output_dir='./random_search', generation_dir='./random_search_commands')

You can download this example and use it as a template.

Bayesian Optimization (GP)¶

The Bayesian optimization method with a Gaussian process surrogate is more complex. At first, you have to specify the bounds of the suggestions. Additionally, you can set the transformation of the search space. In combination with the bounds, this can be used for a rescaling of the kernel or for optimization of discrete values:

from deepobs.tuner import GP
from torch.optim import SGD
from sklearn.gaussian_process.kernels import Matern
from deepobs import config
from deepobs.pytorch.runners import StandardRunner

optimizer_class = SGD
hyperparams = {"lr": {"type": float},
               "momentum": {"type": float},
               "nesterov": {"type": bool}}

# The bounds for the suggestions
bounds = {'lr': (-5, 2),
        'momentum': (0.5, 1),
        'nesterov': (0, 1)}


# Corresponds to rescaling the kernel in log space.
def lr_transform(lr):
    return 10**lr


# Nesterov is discrete but will be suggested continious.
def nesterov_transform(nesterov):
    return bool(round(nesterov))


# The transformations of the search space. Momentum does not need a transformation.
transformations = {'lr': lr_transform,
                   'nesterov': nesterov_transform}

tuner = GP(optimizer_class, hyperparams, bounds, runner=StandardRunner, ressources=36, transformations=transformations)

# Tune with a Matern kernel and rerun the best setting with 10 different seeds.
tuner.tune('quadratic_deep', kernel=Matern(nu=2.5), rerun_best_setting=True, num_epochs=2, output_dir='./gp_tuner')

You can download this example and use it as a template. Since Bayesian optimization is sequential by nature, we do not offer a parallelized version of it.