Clean up how I code non stationary bandits and especially the regret

My `Environment` class is starting to be too messy, especially regarding how I handle the different cases of stationary and non stationary bandits.

- [x] I should remove this optimization of `self._historyOfChangePoints` and `self._historyOfMeans` dictionaries in NonStationaryMAB, and just store the full list of size K * T of means!
- [x] I should just store this full list (don't care about size, it will anyway be of the order of T which is the storage space needed to store the actual rewards!)
- [x] And trick the `.maxArm` attribute of `env` in `Evaluator` so that the previously written methods for regret / last regret / accumulated rewards etc are the same.

## For plots
- [x] Giving number of break points in the ylabel is fine, but maybe I could add red star markers on the xaxis at each breakpoint ? Or dashed vertical line?
- [x] Fixed for regret (I think, at least for `PieceWiseStationaryMAB`)
- [x] TODO fix for best arm pull selections!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Clean up how I code non stationary bandits and especially the regret #153

For plots

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Clean up how I code non stationary bandits and especially the regret #153

Description

For plots

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions