# nanoq
[![build](https://github.com/esteinig/nanoq/actions/workflows/rust-ci.yaml/badge.svg?branch=master)](https://github.com/esteinig/nanoq/actions/workflows/rust-ci.yaml)
[![codecov](https://codecov.io/gh/esteinig/nanoq/branch/master/graph/badge.svg?token=1X04YD8YOE)](https://codecov.io/gh/esteinig/nanoq)
![](https://img.shields.io/badge/version-0.10.0-black.svg)
[![DOI](https://joss.theoj.org/papers/10.21105/joss.02991/status.svg)](https://doi.org/10.21105/joss.02991)
Ultra-fast quality control and summary reports for nanopore reads
## Overview
**`v0.10.0`**
- [Purpose](#purpose)
- [Install](#install)
- [Usage](#usage)
- [Read filters](#read-filters)
- [Read report](#read-report)
- [Fast mode](#fast-mode)
- [Compression](#compression)
- [Online runs](#online-runs)
- [Parameters](#parameters)
- [Output](#output)
- [Benchmarks](#benchmarks)
- [Dependencies](#dependencies)
- [Etymology](#etymology)
- [Contributions](#contributions)
## Purpose
`Nanoq` implements ultra-fast read filters and summary reports for high-throughput nanopore reads.
## Citation
We would appreciate a citation if you are using `nanoq` for research. Please see [here](#etymology) for some suggestions how you could give back to the community if you are using `nanoq` for industry applications :pray:
> Steinig and Coin (2022). Nanoq: ultra-fast quality control for nanopore reads. Journal of Open Source Software, 7(69), 2991, https://doi.org/10.21105/joss.02991
## Performance
`Nanoq` is as fast as `seqtk-fqchk` for summary statistics of small datasets (e.g. Zymo - 100,000 reads) and slightly faster on large datasets (e.g. Zymo - 3.5 million reads, 1.3x - 1.5x). In `fast` mode (no quality scores), `nanoq` is ~2-3x faster than `rust-bio-tools` and `seqkit stats` for summary statistics and other commonly used read summary or filtering methods (up to 297x-442x). Memory consumption is consistent and tends to be lower than other tools (~5-10x).
## Tests
`Nanoq` comes with high test coverage for your peace of mind.
```
cargo test
```
## Install
#### `Cargo`
```
cargo install nanoq
```
#### `Conda`
Explicit version (for some reason defaults to old version)
```
conda install -c conda-forge -c bioconda nanoq=0.10.0
```
#### `Binaries`
Precompiled binaries for Linux and MacOS are attached to the latest release.
```
VERSION=0.10.0
RELEASE=nanoq-${VERSION}-x86_64-unknown-linux-musl.tar.gz
wget https://github.com/esteinig/nanoq/releases/download/${VERSION}/${RELEASE}
tar xf nanoq-${VERSION}-x86_64-unknown-linux-musl.tar.gz
nanoq-${VERSION}-x86_64-unknown-linux-musl/nanoq -h
```
## Usage
`Nanoq` accepts a file (`-i`) or stream (`stdin`) of reads in `fast{a,q}.{gz,bz2,xz}` format and outputs reads to file (`-o`) or stream (`stdout`).
```bash
nanoq -i test.fq.gz -o reads.fq
cat test.fq.gz | nanoq > reads.fq
```
### Read filters
Reads can be filtered by minimum read length (`-l`), maximum read length (`-m`), minimum average read quality (`-q`) or maximum average read quality (`-w`).
```bash
nanoq -i test.fq -l 1000 -m 10000 -q 10 -w 15 > reads.fq
```
### Read trimming
A fixed number of bases can be trimmed from the start (`-S`) or end (`-E`) of reads:
```bash
nanoq -i test.fq -S 100 -E 100 > reads.fq
```
### Read report
Read summaries are produced when using the stats flag (`-s`, report to `stdout`, no read output to `stdout`) or when specifying a report file (`-r`):
```bash
nanoq -i test.fq -s
nanoq -i test.fq -r report.txt > reads.fq
```
For report types and configuration see the [output section](#output).
### Fast mode
> :warning: When using fast mode `-f` read quality scores are not computed (output of quality fields: `NaN`)
Read qualities may be excluded from filters and statistics to speed up read iteration (`-f`).
```bash
nanoq -i test.fq.gz -f -s
```
### Compression
Output compression is inferred from file extensions (`gz`, `bz2`, `lzma`).
```bash
nanoq -i test.fq -o reads.fq.gz
```
Output compression can be specified manually with `-O` and `-c`.
```bash
nanoq -i test.fq -O g -c 9 -o reads.fq.gz
```
### Online runs
`Nanoq` can be used to check on active sequencing runs and barcoded samples.
```bash
find /data/nanopore/run -name "*.fastq" -print0 | xargs -0 cat | nanoq -s
```
```bash
for i in {01..12}; do
find /data/nanopore/run -name barcode${i}.fastq -print0 | xargs -0 cat | nanoq -s
done
```
### Parameters
```
nanoq 0.10.0
Filters and summary reports for nanopore reads
USAGE:
nanoq [FLAGS] [OPTIONS]
FLAGS:
-f, --fast Ignore quality values if present
-h, --help Prints help information
-H, --header Header for summary output
-j, --json Summary report in JSON format
-s, --stats Summary report only [stdout]
-V, --version Prints version information
-v, --verbose Verbose output statistics [multiple, up to -vvv]
OPTIONS:
-c, --compress-level <1-9> Compression level to use if compressing output [default: 6]
-i, --input Fast{a,q}.{gz,xz,bz}, stdin if not present
-m, --max-len Maximum read length filter (bp) [default: 0]
-w, --max-qual Maximum average read quality filter (Q) [default: 0]
-l, --min-len Minimum read length filter (bp) [default: 0]
-q, --min-qual Minimum average read quality filter (Q) [default: 0]
-o, --output