Early stopping and timeout in optuna – Passion is like genius; a miracle.

I have previously explained how to run optuna in parallel. Here, I’d like to explain how one can apply timeout and early stopping.

First thing to understand is the use of Manager. It’s required to safely passing around variables in multiprocessing context. In the code below, success, pruneded, failure counters are passed so that multiprocessing optimization can access and update them safely.

        with Manager() as manager:
            self._start_time = time.time()
            self._n_success = manager.Value("i", 0)
            self._n_pruned = manager.Value("i", 0)
            self._n_failure = manager.Value("i", 0)

Actual use of those variables happens in a callback by callling optimize as below.

study.optimize(..., callbacks=[YourMonitoringClass(n_success=self._n_success, ...)]

Now, the monitoring class can receive those variables safely. And in your class’s __call__ , we call the functions below to decide early stop. Additionally, you can record # of success, failures, etc.

    def __call__(self, study: optuna.Study, trial: optuna.trial.FrozenTrial):
        match trial.state:
            case optuna.trial.TrialState.FAIL:
                self._n_failure.value += 1
                status_to_log = Status.FAILED
                log_level = logging.ERROR
            case optuna.trial.TrialState.PRUNED:
                self._n_pruned.value += 1
                status_to_log = Status.PRUNED
                log_level = logging.INFO
            case optuna.trial.TrialState.COMPLETE:
                self._n_success.value += 1
                status_to_log = Status.FINISHED
                log_level = logging.INFO
        n_finished = (
            self._n_success.value + self._n_pruned.value + self._n_failure.value
        )
        ... log progress ...
        self._may_timeout(study)
        self._may_early_stop(study)

    def _may_timeout(self, study: optuna.Study):
        if time.time() - self._start_time > self._timeout:
            ... log that this is timeout ...
            study.stop()

    def _may_early_stop(self, study: optuna.Study):
        if self._early_stopping_config is None:
            return
        trials = [t.value for t in study.trials if t.value is not None]
        if len(trials) < self._early_stopping_config.patience:
            return
        recent_trials = trials[-self._early_stopping_config.patience :]
        recent_std = np.std(recent_trials)
        if recent_std < self._early_stopping_config.threshold:
            ... log that this is early stop ...
            study.stop()

Also, keep in mind that you need to early stop using validation data.