Skip to content

Clean up how I code non stationary bandits and especially the regret #153

@Naereen

Description

@Naereen

My Environment class is starting to be too messy, especially regarding how I handle the different cases of stationary and non stationary bandits.

  • I should remove this optimization of self._historyOfChangePoints and self._historyOfMeans dictionaries in NonStationaryMAB, and just store the full list of size K * T of means!
  • I should just store this full list (don't care about size, it will anyway be of the order of T which is the storage space needed to store the actual rewards!)
  • And trick the .maxArm attribute of env in Evaluator so that the previously written methods for regret / last regret / accumulated rewards etc are the same.

For plots

  • Giving number of break points in the ylabel is fine, but maybe I could add red star markers on the xaxis at each breakpoint ? Or dashed vertical line?
  • Fixed for regret (I think, at least for PieceWiseStationaryMAB)
  • TODO fix for best arm pull selections!

Metadata

Metadata

Assignees

Labels

bugWOO I have to fix something urgent!non-stationaryFor non-stationary bandits simulations

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions