| Crates.io | gtfjson |
| lib.rs | gtfjson |
| version | 0.1.6 |
| created_at | 2023-07-21 19:33:21.807712+00 |
| updated_at | 2023-10-13 16:20:11.221448+00 |
| description | A tool to convert GTF files to newline-delim JSON |
| homepage | |
| repository | https://github.com/noamteyssier/gtfjson |
| max_upload_size | |
| id | 922608 |
| size | 938,573 |
A simple CLI utility to convert a GTF file to NDJSON for fast parsing and perform other functionalities on those jsons.
The GTF file format is fantastic when working with bedtools since it is essentially
a modified version of the BED file format.
However, if you're interested in the annotations column, it can be a massive headache to parse - especially if you're operating on the full genome.
I wrote this tool to convert the GTF file format into streamable newline-delim JSON.
This makes it convenient to load with polars in python incredibly fast and skip
all the annotation parsing.
You can install this with the rust package manager cargo:
cargo install gtfjson
The executable of this tool is gj.
To convert GTF file formats to NDJSON we can use the convert subcommand
# classic i/o
gj convert -i <input.gtf> -o output.json
# write to stdout
gj convert -i <input.gtf>
We can also use gj to partition a gtf-json in different ways.
It takes a variable in the attributes and creates a new file for each category of that record and populates those files with the records matching that category.
For example - we can write the GTF of every gene to a separate file:
# Partition on gene_name
gj partition -i <input.ndjson> -o partitions/ -v gene_name
# Partition of gene_id
gj partition -i <input.ndjson> -o partitions/ -v gene_id
# Partition of transcript_biotype
gj partition -i <input.ndjson> -o partitions/ -v transcript_biotype