| Crates.io | vcf-reformatter |
| lib.rs | vcf-reformatter |
| version | 0.3.0 |
| created_at | 2025-07-22 20:44:15.88414+00 |
| updated_at | 2025-08-15 19:53:17.178084+00 |
| description | Fast VCF file parser and reformatter with VEP and SnpEff annotation support which can output to MAF |
| homepage | https://github.com/flalom/vcf-reformatter/blob/main/README.md |
| repository | https://github.com/flalom/vcf-reformatter |
| max_upload_size | |
| id | 1763981 |
| size | 237,894 |
Did it ever happen that you had VCF files and you wanted to have a look at the data as you would do with a normal table? VCF Reformatter is here for your rescue!
A Rust command-line tool for parsing and reformatting VCF (Variant Call Format) files, with support for VEP (Variant Effect Predictor) and SnpEff annotations. This tool flattens complex VCF files into tab-separated values (TSV) format for easier downstream analysis. Also incredibly useful for quick checks to your data!
Transform complex VCF files into clean, analyzable tables with ease
A high-performance Rust tool for flattening VCF files with intelligent VEP and SnpEff annotation handling
# Download binary from releases (easiest! You download and use it)
wget https://github.com/flalom/vcf-reformatter/releases/latest/download/vcf-reformatter-v0.3.0-linux-x86_64
chmod +x vcf-reformatter-v0.3.0-linux-x86_64
# Transform your VCF file
./vcf-reformatter-v0.3.0-linux-x86_64 sample.vcf.gz
# Generate MAF output β οΈ (in beta!)
./vcf-reformatter-v0.3.0-linux-x86_64 sample.vcf.gz --output-format maf
OR Via Bioconda
conda install -c bioconda vcf-reformatter
# or
# mamba install vcf-reformatter -c bioconda
OR install from crates.io:
cargo install vcf-reformatter
OR build from source (you need Rust toolchain):
git clone https://github.com/flalom/vcf-reformatter.git
cd vcf-reformatter
cargo build --release
./target/release/vcf-reformatter sample.vcf.gz
MAF output is currently in beta testing (v0.3.0). Known limitations:
Memory considerations for MAF:
Files >100K variants: Monitor memory usage
Files >1M variants: Ensure adequate RAM (16GB+)
The Problem: VCF files are notoriously difficult to analyze. Complex nested annotations, semicolon-separated INFO fields, and multi-transcript VEP annotations make downstream analysis a nightmare.
The Solution: VCF Reformatter flattens everything into clean, readable TSV format that works seamlessly with Excel, R, Python, and any analysis tool (β οΈ beware Excel auto-correction!).
Before (Raw VCF):
chr1 69511 . A G 1294.53 . DP=65;AF=1;CSQ=G|missense_variant|MODERATE|OR4F5|ENSG00000186092...
After (Reformatted TSV):
CHROM POS REF ALT QUAL INFO_DP INFO_AF CSQ_Allele CSQ_Consequence CSQ_SYMBOL
chr1 69511 A G 1294.53 65 1 G missense_variant OR4F5
| Feature | Description | Benefit |
|---|---|---|
| 𧬠VEP/SnpEff Annotation Parsing | Intelligent handling of CSQ/ANN annotations | No more manual parsing of complex VEP/SnpEff output |
| π Automatic Annotation Recognition | Automatic detection of CSQ/ANN annotations | Saving even more time now for both VEP and SnpEff |
| π Smart Transcript Handling | Most severe, first only, or split transcripts | Choose the analysis approach that fits your needs |
| π Parallel Processing | Multi-threaded processing up to 30k variants/sec | Process large cohorts in minutes, not hours |
| π Native Compression | Direct .vcf.gz reading & gzip output |
Seamless workflow with compressed/uncompressed files |
| π― Production Ready | Comprehensive error handling & logging | Reliable for automated pipelines |
| π³ Container Support | Docker & Singularity ready | Deploy anywhere, from laptops to HPC clusters |
No Rust installation required - just download and run:
Go to Releases
Download the binary for your platform:
vcf-reformatter-v0.3.0-linux-x86_64 β Linux (most users)vcf-reformatter-v0.3.0-linux-x86_64-static β HPC clusters (works everywhere)vcf-reformatter-v0.3.0-windows-x86_64.exe β Windowsvcf-reformatter-v0.3.0-macos-x86_64 β Intel Macvcf-reformatter-v0.3.0-macos-arm64 β Apple Silicon Mac (M1/M2/M3/M4)Make executable and run:
# Linux/Mac
chmod +x vcf-reformatter-*
./vcf-reformatter-* --help
# Windows
# Just double-click or run from command prompt
# C++ might be required, if not already installed
git clone https://github.com/flalom/vcf-reformatter.git
cd vcf-reformatter
cargo build --release
# Build the container
docker build -t vcf-reformatter .
# Run with your data
docker run --rm -v $(pwd):/data vcf-reformatter /data/sample.vcf.gz
# Build Singularity image
singularity build vcf-reformatter.sif Singularity
# Run on HPC cluster
singularity run --bind $PWD:/data vcf-reformatter.sif /data/sample.vcf.gz -j 16
# Simple conversion
vcf-reformatter input.vcf.gz
# Most severe consequence only (recommended for analysis)
vcf-reformatter input.vcf.gz -t most-severe
# All transcripts in separate rows (comprehensive)
vcf-reformatter input.vcf.gz -t split
# Auto-detect annotation type (recommended)
vcf-reformatter input.vcf.gz -a auto
# Force VEP processing
vcf-reformatter vep_annotated.vcf.gz -a vep -t most-severe
# Force SnpEff processing
vcf-reformatter snpeff_annotated.vcf.gz -a snpeff -t most-severe
# High-performance processing with compression
vcf-reformatter large_cohort.vcf.gz \
--transcript-handling most-severe \
--threads 0 \
--compress \
--output-dir results/ \
--prefix my_analysis \
--verbose
# Optimized for HPC environments
vcf-reformatter huge_dataset.vcf.gz -t most-severe -j 32 -o /scratch/results/ -c -v
Usage: vcf-reformatter [OPTIONS] <INPUT_FILE>
Arguments:
<INPUT_FILE> Input VCF file (supports .vcf.gz)
Options:
--output-format <FORMAT> Output format [default: tsv]
[values: tsv, maf]
--center <CENTER> Sequencing center for MAF output
--ncbi-build <BUILD> Genome build
[default: GRCh38]
--sample-barcode <BARCODE> Sample identifier for MAF output
-t, --transcript-handling <MODE> How to handle multiple transcripts
[default: first]
[values: most-severe, first, split]
-a, --annotation-type <N> Which annotations to parse VEP/SnpEff
[default: auto]
[values: snpeff, vep, auto]
-j, --threads <N> Thread count (0 = auto-detect) [default: 1]
-o, --output-dir <DIR> Output directory [default: current]
-p, --prefix <PREFIX> Output file prefix [default: input filename]
-c, --compress Compress output with gzip
-v, --verbose Detailed performance statistics
-h, --help Show help
-V, --version Show version
VCF files with VEP annotations often contain multiple transcript annotations per variant. Choose the strategy that fits your analysis:
--transcript-handling most-severe)Best for: Clinical analysis, variant prioritization
vcf-reformatter input.vcf.gz -t most-severe
# for maf output
vcf-reformatter input.vcf.gz -t most-severe --output-format maf
Selects the transcript with the most severe consequence (stop_gained > missense_variant > synonymous, etc.)
--transcript-handling first) [Default]Best for: Quick analysis, performance-critical workflows
vcf-reformatter input.vcf.gz # Uses first transcript by default
Processes only the first transcript annotation (fastest option)
--transcript-handling split)Best for: Comprehensive analysis, transcript-level studies
vcf-reformatter input.vcf.gz -t split
Creates separate rows for each transcript (most detailed output)
# Auto-detect optimal thread count
vcf-reformatter input.vcf.gz -j 0
# For files > 10K variants, use parallel processing
vcf-reformatter input.vcf.gz -t most-severe -j 0 -v
# Combine with compression for large outputs
vcf-reformatter input.vcf.gz -t split -j 0 -c -v
VCF Reformatter generates two files:
{prefix}_header.txt - Original VCF header and metadata{prefix}_reformatted.tsv - Flattened tabular dataCHROM, POS, ID, REF, ALT, QUAL, FILTERINFO_DP, INFO_AF, INFO_AC, etc.CSQ_Allele, CSQ_Consequence, CSQ_SYMBOL, CSQ_Gene, etc.ANN_Allele, ANN_Annotation_Impact, ANN_Gene_Name, ANN_Distance, etc.SAMPLE1_GT, SAMPLE1_DP, SAMPLE1_AD, etc.CHROM POS ID REF ALT QUAL FILTER INFO_DP CSQ_Consequence CSQ_SYMBOL SAMPLE1_GT
chr1 69511 . A G 1294.53 PASS 65 missense_variant OR4F5 1/1
chr1 69761 rs123 C T 892.15 PASS 42 synonymous_variant OR4F5 0/1
CHROM POS ID REF ALT QUAL FILTER INFO_DP ANN_Annotation ANN_Gene_Name SAMPLE1_GT
chr1 69761 rs587 C T 730 PASS . 214 synonymous_variant OR4F5 0/1
chr1 924024 . A G 53 PASS . 409 5_prime_UTR_variant SAMD11 1/1
# Read compressed output directly
library(data.table)
data <- fread("output_reformatted.tsv.gz")
# Quick variant summary
summary(data$CSQ_Consequence)
import pandas as pd
# Load and analyze
df = pd.read_csv("output_reformatted.tsv.gz", sep="\t", compression="gzip")
df['CSQ_Consequence'].value_counts()
# Nextflow pipeline
vcf-reformatter ${vcf} -t most-severe -j ${task.cpus} -o results/ -c
# Snakemake rule
shell: "vcf-reformatter {input.vcf} -t most-severe -j {threads} -o {params.outdir} -c"
# Build once
docker build -t vcf-reformatter .
# Run anywhere
docker run --rm \
-v $(pwd):/data \
vcf-reformatter \
/data/input.vcf.gz \
-t most-severe -j 4 -o /data/results/ -c
# On HPC cluster
singularity run \
--bind $PWD:/data \
--bind /scratch:/scratch \
vcf-reformatter.sif \
/data/large_cohort.vcf.gz \
-t most-severe -j 16 -o /scratch/results/ -c -v
| Use Case | Command | Why It Works |
|---|---|---|
| Clinical Variant Review | vcf-reformatter variants.vcf.gz -t most-severe |
Prioritizes clinically relevant consequences |
| Population Analysis | vcf-reformatter cohort.vcf.gz -t first -j 0 -c |
Fast processing of large cohorts |
| Transcript Studies | vcf-reformatter genes.vcf.gz -t split -v |
Comprehensive transcript-level analysis |
| Quick Data Exploration | vcf-reformatter sample.vcf.gz |
Simple, fast conversion for immediate analysis |
| HPC Batch Processing | vcf-reformatter huge.vcf.gz -t most-severe -j 32 -c |
Optimized for high-performance computing |
stdin to combine with other tools, such as bcftoolsWe welcome contributions! Here's how to get started:
git checkout -b feature-namegit commit -am 'Add feature'git push origin feature-namegit clone https://github.com/flalom/vcf-reformatter.git
cd vcf-reformatter
cargo test # Run the test suite
cargo run -- data/sample.vcf.gz -v # Test with sample data
This project is licensed under the MIT License - see the LICENSE file for details.
--transcript-handling most-severe--transcript-handling first--transcript-handling splitVCF Reformatter is specifically designed for:
Yes! VCF Reformatter is designed for production use with:
vcf-reformatter file.vcf.gz -j 0 -cvcf-reformatter file.vcf.gz -vβ Star this repo if VCF Reformatter helps your research!
Made with β€οΈ by Flavio Lombardo