Crates.io | scispeak |
lib.rs | scispeak |
version | 0.1.0 |
source | src |
created_at | 2023-10-11 02:14:46.050426 |
updated_at | 2023-10-11 02:14:46.050426 |
description | A tool for converting Sci-RNA-Seq3 files to 10X Genomics compatible FASTQ files. |
homepage | |
repository | https://github.com/noamteyssier/scispeak |
max_upload_size | |
id | 999690 |
size | 185,565 |
a rust parser to convert sci-seq-v3 reads into kallisto compatible formats
a CLI tool to whitelist filter sci-seq-v3 reads and convert them to a 10X-style format.
This tool is used to filter sciseq reads against their respective barcode whitelists and then output fastq file formats in the style of 10X reads.
This parses the sci-seq-v3 format, identifies the cell barcodes and UMIs and writes out a new file to resemble the 10X sequence construct to be used with other tools that have not yet adopted the sci-seq format.
The sci-rna-seq3 sequencing construct is organized in the following way:
┌─'illumina_p5:29'
├─'i5:10'
├─'truseq_read_1_adapter:33'
│ ┌─'hairpin_barcode:10'
│ ├─'hairpin_adapter:6'
├─read_1─────────────────────┤
│ ├─'umi:8'
──RNA───────┤ └─'cell_bc:10'
├─'poly_T:98'
├─'read_2:98'
│ ┌─'ME:19'
├─i7_primer──────────────────┤
│ └─'s7:15'
├─'i7:10'
└─'illumina_p7:24'
Visualization from seqspec.
And so the resulting R1 and R2 files boil down to:
# R1
[linker][adapter][umi][barcode]
# R2
[cDNA]
This is a single command CLI tool. It requires just the R1 and R2 filepaths
scispeak \
-i data/SRR7827205_sample_R1.fastq.gz \
-I data/SRR7827205_sample_R2.fastq.gz;
However, it can be accelerated using multiple compression threads:
scispeak \
-i data/SRR7827205_sample_R1.fastq.gz \
-I data/SRR7827205_sample_R2.fastq.gz \
-t 8;
And can store a log file as well to keep matching statistics:
scispeak \
-i data/SRR7827205_sample_R1.fastq.gz \
-I data/SRR7827205_sample_R2.fastq.gz \
-t 8 \
-l;
This program will output 3 files per run:
<args.prefix>_R1.fastq.gz
: A fastq with the [barcode][UMI]
construct for all reads passing the whitelist.<args.prefix>_R2.fastq.gz
: An unaltered fastq of the R2 for all reads passing the whitelist.<args.prefix>_log.json
: A log file containing the filtering statistics of the run.