Crates.io | barkit-extract |
lib.rs | barkit-extract |
version | 0.1.1 |
source | src |
created_at | 2024-08-26 15:36:55.141101 |
updated_at | 2024-09-02 18:46:20.786812 |
description | Tool for extracting barcode nucleotide sequence according to a specified regex pattern |
homepage | https://github.com/nsyzrantsev/barkit |
repository | https://github.com/nsyzrantsev/barkit |
max_upload_size | |
id | 1352360 |
size | 40,927 |
BarKit (Barcodes ToolKit) is a toolkit designed for manipulating FASTQ barcodes.
Barkit can be installed from crates.io
using cargo
. This can be done with the following command:
cargo install barkit
git clone https://github.com/nsyzrantsev/barkit
cd barkit/
cargo build --release && sudo mv target/release/barkit /usr/local/bin/
The extract subcommand is designed to parse barcode sequences from FASTQ reads using approximate regex matching based on a provided pattern.
All parsed barcode sequences are moved to the read header with base quality, separated by colons:
@SEQ_ID UMI:ATGC:???? CB:ATGC:???? SB:ATGC:????
UMI: Unique Molecular Identifier (Molecular Barcode)
CB: Cell Barcode
SB: Sample Barcode
Parse the first twelve nucleotides as a UMI from each forward read:
barkit extract -1 <IN_FASTQ1> -2 <IN_FASTQ2> -p "^(?P<UMI>[ATGCN]{12})" -o <OUT_FASTQ1> -O <OUT_FASTQ2>
Parse the first sixteen nucleotides as a cell barcode from each reverse read before the atgccat
adapter sequence:
barkit extract -1 <IN_FASTQ1> -2 <IN_FASTQ2> -P "^(?P<CB>[ATGCN]{16})atgccat" -o <OUT_FASTQ1> -O <OUT_FASTQ2>
[!NOTE] Use lowercase letters for fuzzy match patterns.