[BUG] sporadic test failure of `test_evaluator` `test_plots` on Mac Python 3.7

**Describe the bug**


Here is the failed run: https://github.com/alan-turing-institute/sktime/runs/5788191259?check_suite_focus=true

```
=================================== FAILURES ===================================
_____________________ benchmarking/tests/test_evaluator.py _____________________
[gw0] darwin -- Python 3.7.12 /Users/runner/hostedtoolcache/Python/3.7.12/x64/bin/python
worker 'gw0' crashed while running 'benchmarking/tests/test_evaluator.py::test_plots'
=============================== warnings summary ===============================
../../../../hostedtoolcache/Python/3.7.12/x64/lib/python3.7/site-packages/coverage/inorout.py:472
../../../../hostedtoolcache/Python/3.7.12/x64/lib/python3.7/site-packages/coverage/inorout.py:472
../../../../hostedtoolcache/Python/3.7.12/x64/lib/python3.7/site-packages/coverage/inorout.py:472
../../../../hostedtoolcache/Python/3.7.12/x64/lib/python3.7/site-packages/coverage/inorout.py:472
  /Users/runner/hostedtoolcache/Python/3.7.12/x64/lib/python3.7/site-packages/coverage/inorout.py:472: CoverageWarning: --include is ignored because --source is set (include-ignored)
    self.warn("--include is ignored because --source is set", slug="include-ignored")

benchmarking/tests/test_experiments.py::test_run_clustering_experiment
  /Users/runner/hostedtoolcache/Python/3.7.12/x64/lib/python3.7/site-packages/tslearn/utils/utils.py:156: DeprecationWarning: `np.float` is a deprecated alias for the builtin `float`. To silence this warning, use `float` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.float64` here.
  Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
    def to_time_series_dataset(dataset, dtype=numpy.float):

benchmarking/tests/test_experiments.py::test_run_clustering_experiment
  /Users/runner/hostedtoolcache/Python/3.7.12/x64/lib/python3.7/site-packages/tslearn/utils/cast.py:15: DeprecationWarning: `np.float` is a deprecated alias for the builtin `float`. To silence this warning, use `float` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.float64` here.
  Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
    def to_sklearn_dataset(dataset, dtype=numpy.float, return_dim=False):

benchmarking/tests/test_experiments.py::test_run_clustering_experiment
  /Users/runner/hostedtoolcache/Python/3.7.12/x64/lib/python3.7/site-packages/tslearn/metrics/utils.py:9: DeprecationWarning: `np.float` is a deprecated alias for the builtin `float`. To silence this warning, use `float` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.float64` here.
  Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
    compute_diagonal=True, dtype=numpy.float, *args, **kwargs):

benchmarking/tests/test_experiments.py::test_run_clustering_experiment
  /Users/runner/hostedtoolcache/Python/3.7.12/x64/lib/python3.7/site-packages/tslearn/metrics/sax.py:2: DeprecationWarning: `np.float` is a deprecated alias for the builtin `float`. To silence this warning, use `float` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.float64` here.
  Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
    from .cysax import cydist_sax

benchmarking/tests/test_experiments.py::test_run_clustering_experiment
  /Users/runner/hostedtoolcache/Python/3.7.12/x64/lib/python3.7/site-packages/tslearn/metrics/sax.py:2: DeprecationWarning: `np.int` is a deprecated alias for the builtin `int`. To silence this warning, use `int` by itself. Doing this will not modify any behavior and is safe. When replacing `np.int`, you may wish to use e.g. `np.int64` or `np.int32` to specify the precision. If you wish to review your current use, check the release note link for additional information.
  Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
    from .cysax import cydist_sax

benchmarking/tests/test_experiments.py::test_run_clustering_experiment
  /Users/runner/hostedtoolcache/Python/3.7.12/x64/lib/python3.7/site-packages/tslearn/metrics/__init__.py:24: DeprecationWarning: `np.float` is a deprecated alias for the builtin `float`. To silence this warning, use `float` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.float64` here.
  Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
    from .cycc import cdist_normalized_cc, y_shifted_sbd_vec

benchmarking/tests/test_experiments.py: 4530 warnings
clustering/tests/test_k_shapes.py: 2093 warnings
tests/test_all_estimators.py: 30028 warnings
clustering/tests/test_kernel_k_means.py: 18730 warnings

tests/test_all_estimators.py: 28 warnings
  /Users/runner/hostedtoolcache/Python/3.7.12/x64/lib/python3.7/site-packages/sktime/transformations/series/boxcox.py:377: RuntimeWarning: invalid value encountered in power
    x_ratio = x_std / x_mean ** (1 - lmb)

tests/test_all_estimators.py::TestAllEstimators::test_fit_idempotent[LogTransformer-TransformerFitTransformPanelUnivariateWithClassY]
tests/test_all_estimators.py::TestAllEstimators::test_methods_do_not_change_state[LogTransformer-TransformerFitTransformPanelUnivariateWithClassY]
tests/test_all_estimators.py::TestAllEstimators::test_methods_have_no_side_effects[LogTransformer-TransformerFitTransformPanelUnivariateWithClassY]
tests/test_all_estimators.py::TestAllEstimators::test_persistence_via_pickle[LogTransformer-TransformerFitTransformPanelUnivariateWithClassY]
  /Users/runner/hostedtoolcache/Python/3.7.12/x64/lib/python3.7/site-packages/sktime/transformations/series/boxcox.py:250: RuntimeWarning: invalid value encountered in log
    Xt = np.log(X)

transformations/tests/test_all_transformers.py::test_all_transformers[HampelFilter]
  /Users/runner/hostedtoolcache/Python/3.7.12/x64/lib/python3.7/site-packages/numpy/lib/nanfunctions.py:1120: RuntimeWarning: All-NaN slice encountered
    overwrite_input=overwrite_input)

transformations/tests/test_all_transformers.py::test_all_transformers[TransformedTargetForecaster]
  /Users/runner/hostedtoolcache/Python/3.7.12/x64/lib/python3.7/site-packages/sklearn/utils/extmath.py:985: RuntimeWarning: invalid value encountered in true_divide
    updated_mean = (last_sum + new_sum) / updated_sample_count

transformations/tests/test_all_transformers.py::test_all_transformers[TransformedTargetForecaster]
  /Users/runner/hostedtoolcache/Python/3.7.12/x64/lib/python3.7/site-packages/sklearn/utils/extmath.py:990: RuntimeWarning: invalid value encountered in true_divide
    T = new_sum / new_sample_count

transformations/tests/test_all_transformers.py::test_all_transformers[TransformedTargetForecaster]
  /Users/runner/hostedtoolcache/Python/3.7.12/x64/lib/python3.7/site-packages/sklearn/utils/extmath.py:1020: RuntimeWarning: invalid value encountered in true_divide
    new_unnormalized_variance -= correction ** 2 / new_sample_count

performance_metrics/forecasting/_classes.py: 8 warnings
performance_metrics/forecasting/_functions.py: 8 warnings
performance_metrics/tests/test_performance_metrics_forecasting.py: 60 warnings
  /Users/runner/hostedtoolcache/Python/3.7.12/x64/lib/python3.7/site-packages/sktime/performance_metrics/forecasting/_functions.py:1802: FutureWarning: In the percentage error metric functions the default argument symmetric=True is changing to symmetric=False in v0.12.0.
    FutureWarning,

performance_metrics/forecasting/_classes.py: 8 warnings
performance_metrics/forecasting/_functions.py: 8 warnings
performance_metrics/tests/test_performance_metrics_forecasting.py: 60 warnings
  /Users/runner/hostedtoolcache/Python/3.7.12/x64/lib/python3.7/site-packages/sktime/performance_metrics/forecasting/_functions.py:1669: FutureWarning: In the percentage error metric functions the default argument symmetric=True is changing to symmetric=False in v0.12.0.
    FutureWarning,

performance_metrics/forecasting/_classes.py: 8 warnings
performance_metrics/forecasting/_functions.py: 8 warnings
performance_metrics/tests/test_performance_metrics_forecasting.py: 60 warnings
  /Users/runner/hostedtoolcache/Python/3.7.12/x64/lib/python3.7/site-packages/sktime/performance_metrics/forecasting/_functions.py:1941: FutureWarning: In the percentage error metric functions the default argument symmetric=True is changing to symmetric=False in v0.12.0.
    FutureWarning,

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html

---------- coverage: platform darwin, python 3.7.12-final-0 ----------
---------------------- coverage: failed workers ----------------------
The following workers failed to return coverage data, ensure that pytest-cov is installed on these workers.
gw0
Coverage HTML written to dir htmlcov

============================= slowest 20 durations =============================
47.79s call     classification/distance_based/_proximity_forest.py::sktime.classification.distance_based._proximity_forest.ProximityForest
42.98s call     classification/compose/_column_ensemble.py::sktime.classification.compose._column_ensemble.ColumnEnsembleClassifier
28.47s call     forecasting/tests/test_all_forecasters.py::TestAllForecasters::test_update_predict_predicted_index[Prophet-update_params=True-window_length3-fh=1]
28.12s call     forecasting/tests/test_all_forecasters.py::TestAllForecasters::test_update_predict_predicted_index[Prophet-update_params=True-5-fh=1]
27.90s call     forecasting/tests/test_all_forecasters.py::TestAllForecasters::test_update_predict_predicted_index[Prophet-update_params=True-window_length2-fh=1]
27.82s call     forecasting/tests/test_all_forecasters.py::TestAllForecasters::test_update_predict_predicted_index[Prophet-update_params=True-1-fh=1]
27.74s call     forecasting/tests/test_all_forecasters.py::TestAllForecasters::test_update_predict_predicted_index[Prophet-update_params=True-window_length4-fh=1]
27.73s call     forecasting/tests/test_all_forecasters.py::TestAllForecasters::test_update_predict_predicted_index[Prophet-update_params=True-window_length5-fh=1]
27.57s call     forecasting/tests/test_all_forecasters.py::TestAllForecasters::test_update_predict_predicted_index[Prophet-update_params=False-step=1-window_length4-fh=1]
27.33s call     forecasting/tests/test_all_forecasters.py::TestAllForecasters::test_update_predict_predicted_index[Prophet-update_params=False-step=1-1-fh=1]
27.16s call     forecasting/tests/test_all_forecasters.py::TestAllForecasters::test_update_predict_predicted_index[Prophet-update_params=False-step=1-window_length3-fh=1]
27.11s call     forecasting/tests/test_all_forecasters.py::TestAllForecasters::test_update_predict_predicted_index[Prophet-update_params=False-step=1-window_length2-fh=1]
27.05s call     forecasting/tests/test_all_forecasters.py::TestAllForecasters::test_update_predict_predicted_index[Prophet-update_params=False-step=1-window_length5-fh=1]
26.94s call     forecasting/tests/test_all_forecasters.py::TestAllForecasters::test_update_predict_predicted_index[Prophet-update_params=False-step=1-5-fh=1]
26.54s call     forecasting/statsforecast.py::sktime.forecasting.statsforecast.StatsForecastAutoARIMA
25.92s call     classification/distance_based/tests/test_proximity_forest.py::test_pf_on_unit_test_data
23.90s call     tests/test_all_estimators.py::TestAllEstimators::test_fit_idempotent[MatrixProfileTransformer-TransformerFitTransformSeriesUnivariate]
20.34s call     forecasting/tests/test_all_forecasters.py::TestAllForecasters::test_update_predict_predicted_index[Prophet-update_params=True-5-fh=[2 5]]
19.99s call     tests/test_all_estimators.py::TestAllEstimators::test_raises_not_fitted_error[ARIMA-ForecasterFitPredictUnivariateNoX]
19.81s call     forecasting/tests/test_all_forecasters.py::TestAllForecasters::test_update_predict_predicted_index[Prophet-update_params=True-1-fh=[2 5]]
=========================== short test summary info ============================
FAILED benchmarking/tests/test_evaluator.py::test_plots
= 1 failed, 20210 passed, 1684 skipped, 5 xfailed, 1 xpassed, 123303 warnings in 2851.60s (0:47:31) =
make: *** [test] Error 1
Error: Process completed with exit code 2.
```

**Additional context**


Does not seem related to anything in the PR itself. After updating the branch with newer commits the pipeline is successful again.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[BUG] sporadic test failure of `test_evaluator` `test_plots` on Mac Python 3.7 #2368

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[BUG] sporadic test failure of test_evaluator test_plots on Mac Python 3.7 #2368

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

[BUG] sporadic test failure of `test_evaluator` `test_plots` on Mac Python 3.7 #2368