Crates.io | gtfjson |
lib.rs | gtfjson |
version | 0.1.6 |
source | src |
created_at | 2023-07-21 19:33:21.807712 |
updated_at | 2023-10-13 16:20:11.221448 |
description | A tool to convert GTF files to newline-delim JSON |
homepage | |
repository | https://github.com/noamteyssier/gtfjson |
max_upload_size | |
id | 922608 |
size | 938,573 |
A simple CLI utility to convert a GTF file to NDJSON for fast parsing and perform other functionalities on those jsons.
The GTF file format is fantastic when working with bedtools
since it is essentially
a modified version of the BED
file format.
However, if you're interested in the annotations column, it can be a massive headache to parse - especially if you're operating on the full genome.
I wrote this tool to convert the GTF file format into streamable newline-delim JSON.
This makes it convenient to load with polars
in python incredibly fast and skip
all the annotation parsing.
You can install this with the rust package manager cargo
:
cargo install gtfjson
The executable of this tool is gj
.
To convert GTF file formats to NDJSON we can use the convert
subcommand
# classic i/o
gj convert -i <input.gtf> -o output.json
# write to stdout
gj convert -i <input.gtf>
We can also use gj
to partition a gtf-json in different ways.
It takes a variable in the attributes and creates a new file for each category of that record and populates those files with the records matching that category.
For example - we can write the GTF of every gene to a separate file:
# Partition on gene_name
gj partition -i <input.ndjson> -o partitions/ -v gene_name
# Partition of gene_id
gj partition -i <input.ndjson> -o partitions/ -v gene_id
# Partition of transcript_biotype
gj partition -i <input.ndjson> -o partitions/ -v transcript_biotype