Crates.io | tilecoding |
lib.rs | tilecoding |
version | 0.3.0 |
source | src |
created_at | 2019-05-16 18:19:34.928421 |
updated_at | 2019-06-07 18:29:29.142038 |
description | Rust implementation of Dr. Richard S. Sutton's tile coding software. |
homepage | |
repository | https://github.com/hamaluik/tilecoding-rs |
max_upload_size | |
id | 134727 |
size | 44,680 |
This is a Rust version of the Tile Coding Software developed by Dr. Richard S. Sutton available on his website at: http://incompleteideas.net/tiles/tiles3.html. It is strongly suggested to read that page in full as it describes the reasoning behind this software, how to use it, and gives examples.
To quote the introduction there:
Here we describe software implementing the core part of the idea of tile coding as described in Section 9.5.4 of the reinforcement-learning textbook by Sutton and Barto. That section should be read and fully understood before reading this manual or using this software. The software is currently available in Python and Common Lisp. An implementation in C or C++ would be a welcome contribution.
Tile coding is a way of representing the values of a vector of continuous variables as a large binary vector with few 1s and many 0s. The binary vector is not represented explicitly, but as a list of the components that are 1s. The main step is to partition, or tile, the continuous space multiple times and select one tile from each tiling, that corresponding the the vector's value. Each tile is converted to an element in the big binary vector, and the list of the tile (element) numbers is returned as the representation of the vector's value. Tile coding is recommended as a way of applying online learning methods to domains with continuous state or action variables.
The tile-coding software evolved from software developed at the University of New Hampshire for a related but more specialized use called "CMAC" (see the external documentation by Miller and Glanz). Here we separate tile coding from full CMACs, which allows it to be used more flexibly. The current software is also simplified and streamlined consistent with its use in reinforcement learning applications. For example, our code is only one page in length.
This version provides two implementations:
// initialize an index-hash-table with size `1024`
let mut iht = IHT::new(1024);
// find the indices of tiles for the point (x, y) = (3.6, 7.21) using 8 tilings:
let indices = iht.tiles(8, &[3.6, 7.21], &[]);
// this is the first time we've used the IHT, so we will get the starting tiles:
assert_eq!(indices, vec![0, 1, 2, 3, 4, 5, 6, 7]);
// a nearby point:
let indices = iht.tiles(8, &[3.7, 7.21], &[]);
// differs by one tile:
assert_eq!(indices, vec![0, 1, 2, 8, 4, 5, 6, 7]);
// and a point more than one away in any dim
let indices = iht.tiles(8, &[-37.2, 7.0], &[]);
// will have all different tiles
assert_eq!(indices, vec![9, 10, 11, 12, 13, 14, 15, 16]);