| Crates.io | motif-scanner |
| lib.rs | motif-scanner |
| version | 0.1.1 |
| created_at | 2024-12-18 20:00:19.584306+00 |
| updated_at | 2025-02-24 01:57:25.186616+00 |
| description | Command line tool for scanning DNA sequences for transcription factor binding sites |
| homepage | |
| repository | |
| max_upload_size | |
| id | 1488409 |
| size | 119,288 |
A command-line tool for scanning DNA sequences and predicting transcription factor binding sites.
cargo install motif-scanner
git clone https://github.com/peter6866/tf-binding-rs
cd tf-binding-rs
cargo install --path motif-scanner
Basic usage:
motif-scanner input.csv motifs.meme output.csv
With options:
motif-scanner input.csv motifs.meme output.parquet --cutoff 0.3 --mu 12
DATA_FILE: Input CSV file containing sequences (must have a 'sequence' column)PWM_FILE: MEME format file containing Position Weight MatricesOUTPUT_FILE: Path for output file (.csv or .parquet format)--cutoff: Minimum occupancy threshold (default: 0.2)--mu: Chemical potential parameter (default: 9)The input CSV file must contain a column named 'sequence' with DNA sequences:
id,sequence
seq1,ATCGATCGTGCTAGCTA
seq2,GCTAGCTAGCTAGCTAG
The tool generates a table with the following columns:
label: Sequence index from input fileposition: Position of the binding sitemotif: Name of the transcription factorstrand: Binding strand (F/R)length: Length of the motifoccupancy: Predicted occupancy score# Scan sequences with default parameters
motif-scanner sequences.csv pwm.meme results.csv
# Use stricter threshold and higher chemical potential
motif-scanner sequences.csv pwm.meme results.parquet --cutoff 0.4 --mu 15
# Process and save as Parquet format
motif-scanner data.csv motifs.meme output.parquet
The tool uses parallel processing for efficient scanning of large sequence datasets. Memory usage scales with the number of input sequences and motifs being scanned.