barkit-extract

Crates.iobarkit-extract
lib.rsbarkit-extract
version0.1.1
sourcesrc
created_at2024-08-26 15:36:55.141101
updated_at2024-09-02 18:46:20.786812
descriptionTool for extracting barcode nucleotide sequence according to a specified regex pattern
homepagehttps://github.com/nsyzrantsev/barkit
repositoryhttps://github.com/nsyzrantsev/barkit
max_upload_size
id1352360
size40,927
Nikita Syzrantsev (nsyzrantsev)

documentation

README

BarKit

BarKit (Barcodes ToolKit) is a toolkit designed for manipulating FASTQ barcodes.

Installation

From crates.io

Barkit can be installed from crates.io using cargo. This can be done with the following command:

cargo install barkit

Build from source

  1. Clone the repository:
git clone https://github.com/nsyzrantsev/barkit
cd barkit/
  1. Build:
cargo build --release && sudo mv target/release/barkit /usr/local/bin/

Extract subcommand

The extract subcommand is designed to parse barcode sequences from FASTQ reads using approximate regex matching based on a provided pattern.

All parsed barcode sequences are moved to the read header with base quality, separated by colons:

@SEQ_ID UMI:ATGC:???? CB:ATGC:???? SB:ATGC:????
  • UMI: Unique Molecular Identifier (Molecular Barcode)

  • CB: Cell Barcode

  • SB: Sample Barcode

Examples

Parse the first twelve nucleotides as a UMI from each forward read:

barkit extract -1 <IN_FASTQ1> -2 <IN_FASTQ2> -p "^(?P<UMI>[ATGCN]{12})" -o <OUT_FASTQ1> -O <OUT_FASTQ2>

Parse the first sixteen nucleotides as a cell barcode from each reverse read before the atgccat adapter sequence:

barkit extract -1 <IN_FASTQ1> -2 <IN_FASTQ2> -P "^(?P<CB>[ATGCN]{16})atgccat" -o <OUT_FASTQ1> -O <OUT_FASTQ2>

[!NOTE] Use lowercase letters for fuzzy match patterns.

Commit count: 0

cargo fmt