mcts-rs

Crates.iomcts-rs
lib.rsmcts-rs
version0.1.0
sourcesrc
created_at2024-10-22 07:28:20.886445
updated_at2024-10-22 07:28:20.886445
descriptionA Rust implementation of Monte Carlo Tree Search (MCTS) using an arena allocator.
homepage
repositoryhttps://github.com/PaytonWebber/mcts-rs
max_upload_size
id1418299
size21,343
Payton Webber (PaytonWebber)

documentation

README

Rust Monte Carlo Tree Search (MCTS) with Arena Allocator

License Rust

A Rust implementation of the Monte Carlo Tree Search (MCTS) algorithm using an arena allocator for efficient memory management. This project features a Tic-Tac-Toe game to showcase the MCTS algorithm in action.

Table of Contents

Introduction

Monte Carlo Tree Search (MCTS) is a search algorithm used for decision-making processes. This project provides a Rust implementation of MCTS that efficiently manages memory using an arena allocator. By storing all nodes in a central arena, we avoid the overhead of reference counting and interior mutability, resulting in a more performant and idiomatic Rust codebase.

The included Tic-Tac-Toe game serves as a practical example of how to use the MCTS library.

Features

  • Efficient Memory Management: Utilizes an arena allocator to store nodes, reducing allocation overhead.
  • Generic State Management: Defines a State trait to allow MCTS to work with any game or decision process.

Getting Started

Prerequisites

  • Rust: Ensure you have Rust and Cargo installed. You can install Rust using rustup.

Installation

  1. Clone the repository:

    git clone https://github.com/PaytonWebber/mcts-rs.git
    cd mcts-rs
    

Running the Tic-Tac-Toe Example

To run the Tic-Tac-Toe game where the MCTS algorithm plays against itself with a random starting move, use the following command:

cargo run --example tic_tac_toe

Implementation Details

Arena Allocator

An arena allocator is used to efficiently manage memory for the nodes in the MCTS tree. Nodes are stored in a vector, and their relationships are represented by indices rather than pointers or references. This approach avoids the need for reference counting (Rc) and interior mutability (RefCell), leading to cleaner and more efficient code.

Benefits:

  • Performance: Reduced allocation overhead and improved cache locality.
  • Simplicity: Simplifies ownership and borrowing by avoiding complex lifetime issues.
  • Safety: Leverages Rust's safety guarantees without resorting to unsafe code.

MCTS Algorithm

The MCTS algorithm consists of four main steps:

  1. Selection: Starting from the root node, select child nodes based on the Upper Confidence Bound (UCB) until a leaf node is reached.
  2. Expansion: If the leaf node is not a terminal state, expand it by adding all possible child nodes.
  3. Simulation: Run a simulation from the expanded node to a terminal state by making random moves.
  4. Backpropagation: Update the nodes along the path with the simulation result.

Key Components:

  • Node Struct (node.rs): Represents a node in the search tree.
  • Arena Struct (arena.rs): Stores all nodes and manages parent-child relationships between nodes.
  • MCTS Implementation (mod.rs): Contains the logic for selection, expansion, simulation, and backpropagation.

State Trait

The State trait abstracts the game logic, allowing the MCTS algorithm to work with any game or decision process that implements this trait. Here's the trait definition:

pub trait State {
    /// Checks if the specified player has won the game.
    fn player_has_won(&self, player: usize) -> bool;
    
    /// Determines if the current state is a terminal state (no further moves possible).
    fn is_terminal(&self) -> bool;
    
    /// Returns a vector of legal actions available from the current state.
    fn get_legal_actions(&self) -> Vec<(usize, usize)>;
    
    /// Returns the index of the player whose turn it is to play.
    fn to_play(&self) -> usize;
    
    /// Returns a new state resulting from applying the given action to the current state.
    fn step(&self, action: (usize, usize)) -> Self;
    
    /// Calculates and returns the reward for the specified player in the current state.
    fn reward(&self, player: usize) -> f32;
    
    /// Renders or prints the current state (useful for debugging or display purposes).
    fn render(&self);
}

By implementing this trait for your game or decision process, you can integrate it with the MCTS algorithm provided in this library. The tic_tac_toe.rs file offers an example implementation of the State trait for Tic-Tac-Toe.

Commit count: 19

cargo fmt