# Changelog All notable changes to this project will be documented in this file. The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/), and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html). ## [0.1.1] - hot fix for broken python bindings; remove IGD from the python bindings for now ## [0.1.0] - Rust implementation of `uniwig` that expands on the C++ version - Uniwig now accepts a single sorted `.bed` file, `.narrowPeak` file, or `.bam` file. - Outputs now include `.wig`, `.npy`, `.bedGraph`, and `.bw` - Accumulations can now be counted via `.narrowPeak` scoring - Rust implementation of `igd` ported from the C version (experimental). - Region scoring matrix calculation for region clustering - Fragment file splitter for pseudobulking ## [0.0.15] - added meta tokenization tools and a new `MetaTokenizer` struct that can be used to tokenize regions using the meta-token strategy. - added some annotations to the `pyo3` `#[pyclass]` and `#[pymethods]` attributes to make the python bindings more readable. ## [0.0.14] - renamed repository to `gtars` to better reflect the project's goals. ## [0.0.13] - implemented a fragment file tokenizer that will generate `.gtok` files directly from `fragments.tsv.gz` files. - fix an off-by-one error in the `region-to-id` maps in the `Universe` structs. This was leading to critical bugs in our models. ## [0.0.12] - optimize creation of `PyRegionSet` to reduce expensive cloning of `Universe` structs. ## [0.0.11] - redesigned API for the tokenizers to better emulate the huggingface tokenizers API. - implemented new traits for tokenizers to allow for more flexibility when creating new tokenizers. - bumped the version `pyo3` to `0.21.0` - added `rust-numpy` dependency to the python bindings for exporting tokenized regions as numpy arrays. - overall stability improvements to the tokenizers and the python bindings. ## [0.0.10] - update file format specifications ## [0.0.9] - start working on the concept of a `.gtok` file-format to store tokenized regions - added basic readers and writers for this format ## [0.0.8] - add a new `ids_as_strs` getter to the `TokenizedRegionSet` struct so that we can get the ids as strings quickly, this is meant mostly for interface with geniml. ## [0.0.7] - move things around based on rust club feedback ## [0.0.6] - update python bindings to support the module/submodule structure (https://github.com/PyO3/pyo3/issues/759#issuecomment-1828431711) - change name of some submodules - remove `consts` submodule, just add to base - expose a `__version__` attribute in the python bindings ## [0.0.5] - add many "core utils" - move `gtokenizers` into this package inside `gtars::tokenizers` - create `tokenize` cli - add tests for core utils and tokenizers - RegionSet is now backed by a polars DataFrame - new python bindings for core utils and tokenizers ## [0.0.4] - add type annotations to the python bindings ## [0.0.3] - work on python bindings initialization ## [0.0.2] - prepare for first release ## [0.0.1] - initial setup of repository - two main wrappers: 1) wrapper binary crate, and 2) wrapper library crate - `gtars` can be used as a library crate. or as a command line tool