# rl-bandit: A rust bandit implementation Simple multi-armed bandit algorithm implementation. Implements several bandit algorithms (most of them found in ``Reinforcement Learning: An Introduction'' by *Richard S. Sutton* and *Andrew G. Barto*. It is available for free at http://www.incompleteideas.net/book/the-book-2nd.html). ![](img/comp1.png) # Usage example Initialize the bandit algorithm (a few examples) ```rust // ε-greedy algorithm with 10 arms, ε=0.1, initial values of 0 let egreedy1 = EGreedy::new(10, 0.1, 0.0, UpdateType::Average)); // same ε-greedy but non-stationary with step size of 0.1 let egreedy2 = EGreedy::new(10, 0.1, 0.0, UpdateType::Nonstationary(0.1)); // Upper Confidence Bound with 10 arms and c=1 let ucb1 = UCB::new(10, 1.); // Stochastic gradient with 10 arms, step size of 0.1, with baseline let sg1 = StochasticGradient::new(10, 0.1, true); ``` feedback loop: ```rust // choose the best action according to the bandit algorithm let action = ucb1.choose(); let reward = [...]; // using the action and computing the reward // updates the bandit algorithm using the reward ucb1.update(action, reward); ``` **Note:** A more detailed example and benchmark can be found in the [rl-bandit-bench](https://gitlab.com/librallu/rl-bandit-bench) crate. # implemented algorithms: - [X] ε-greedy - [X] optimistic ε-greedy - [X] Upper-Confidence-Bound (UCB) - [X] Stochastic Gradient Ascent - [ ] EXP3 # library contents - **bandit.rs** traits required to implement a bandit algorithm and helper functions - **src/bandits:** contains implemented bandit algorithms