# isONclust3 A rust implementation of a novel de novo clustering algorithm. isONclust3 is a tool for clustering either PacBio Iso-Seq reads, or Oxford Nanopore reads into clusters, where each cluster represents all reads that came from a gene family. Output is a tsv file with each read assigned to a cluster-ID and a folder 'fastq' containing one fastq file per cluster generated. Detailed information is available in the isONclust3 paper. # Table of contents 1. [Installation](#installation) 2. [Output](#output) 3. [Running isONclust3](#Running) 4. [Contact](#contact) 5. [Credits](#credits) ## Installation Guide At the moment building from source is the only option to install the tool. This requires users to install the Rust programming language onto their system. ## Installing Rust You can install rust via
`curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh` (for macOS and Linux or other Unix-based OS). For Windows please follow the instructions on the following site: https://forge.rust-lang.org/infra/other-installation-methods.html .
## Installation After cloning the repository via `git clone https://github.com/aljpetri/isONclust3.git` use the following two commands to compile the code:
`cd isONclust3`
`cargo build --release` ( Compile the current package, the executable is then located in target/release)
## Running isONclust3 IsONclust3 can be used on either Pacbio data or ONT data. ``` isONclust3 --fastq {input.fastq} --mode ont --outfolder {outfolder} # Oxford Nanopore reads isONclust3 --fastq {input.fastq} --mode pacbio --outfolder {outfolder} # PacBio reads ``` The `--mode ont` argument means setting `--k 13 --w 21`. The `--mode pacbio` argument is equal to setting `--k 15 --w 51`. ## Output #### Clustering information The output consists of a tsv file `final_clusters.tsv` present in the specified output folder. In this file, the first column is the cluster ID and the second column is the read accession. For example: ``` 0 read_X_acc 0 read_Y_acc ... n read_Z_acc ``` if there are n reads there will be n rows. Some reads might be singletons. ### Clusters IsONclust outputs the reads in .fastq file format with each file containing the reads for the respective cluster. The .fastq files are located in the `fastq_files` directory that is created in the given outfolder. ## Contact If you encounter any problems, please raise an issue on the issues page, you can also contact the developer of this repository via: alexander.petri[at]math.su.se ## Credits