Crates.io | spikeq |
lib.rs | spikeq |
version | 1.0.0 |
source | src |
created_at | 2024-11-24 06:13:17.176709 |
updated_at | 2024-11-24 06:13:17.176709 |
description | A synthetic FASTQ record generator with pattern spiking |
homepage | https://github.com/Rbfinch/spikeq |
repository | https://github.com/Rbfinch/spikeq |
max_upload_size | |
id | 1458978 |
size | 2,269,763 |
Generates synthetic FASTQ records free of sequences defined by regex patterns, or containing spiked sequences based on regex patterns
schema.json
file in the examples
directory)spike-sequence
subcommand, resulting in a FASTQ file with a subset of sequences containing the inserted patterns, and a summary file of the inserted patternsspikeq
may be used test bioinformatics tools that process FASTQ files, such as grepq
(https://github.com/Rbfinch/grepq)
Get instructions and examples using spikeq -h
, and spikeq spike-sequence -h
for help on the spike-sequence
subcommand.
[!NOTE] The regex patterns should only include the DNA sequence characters (A, C, G, T), and not IUPAC ambiguity codes (N, R, Y, etc.). If your regex patterns contain any IUPAC ambiguity codes, then transform them to DNA sequence characters (A, C, G, T) before using them with
spikeq
. Seeregex.json
in theexamples
directory for an example of valid pattern file.
spikeq
has been tested on Linux and macOS. It might work on Windows, but it has not been tested on this platform.rustup update
From crates.io (easiest method)
cargo install spikeq
From source
Clone the repository and cd
into the spikeq
directory
Run cargo build --release
Relative to the cloned parent directory, the executable will be located in ./target/release
Make sure the executable is in your PATH
or use the full path to the executable
# Generate 1000 synthetic FASTQ records with sequence lengths between 200 and 800, and which are free from the regex patterns specified in the regex.json file (generated the FASTQ file named `459cac6f-8d65-48ed-99aa-f03930b3c02f.fastq`).
spikeq -r regex.json -n 1000 -l 200,800
# Generate 1000 synthetic FASTQ records with sequence lengths between 200 and 800, and which are free from the regex patterns specified in the regex.json file, then insert two patterns generated from the regex.json file into 10 sequences (generated the FASTQ file named `4b1f92dc-14e1-496f-a68b-d1683251d827.fastq`, and the summary file named `inserted.json` ).
spikeq -r regex.json -n 1000 -l 200,800 spike-sequence --num-patterns 2 --num-sequences 10
If you use spikeq
in your research, please cite as follows:
Crosbie, N.D. (2024). spikeq: A synthetic FASTQ record generator with pattern spiking. 10.5281/zenodo.14211052.
see CHANGELOG
The logo was created using Inkscape and is based on the Thorn Helix SVG Vector at SVGRepo (https://www.svgrepo.com/svg/321583/thorn-helix).
MIT