# gtfjson A simple CLI utility to convert a GTF file to NDJSON for fast parsing and perform other functionalities on those jsons. ## Summary The GTF file format is fantastic when working with `bedtools` since it is essentially a modified version of the `BED` file format. However, if you're interested in the annotations column, it can be a massive headache to parse - especially if you're operating on the full genome. I wrote this tool to convert the GTF file format into streamable newline-delim JSON. This makes it convenient to load with `polars` in python incredibly fast and skip all the annotation parsing. ## Installation You can install this with the rust package manager `cargo`: ``` bash cargo install gtfjson ``` ## Usage The executable of this tool is `gj`. ### Convert To convert GTF file formats to NDJSON we can use the `convert` subcommand ``` bash # classic i/o gj convert -i -o output.json # write to stdout gj convert -i ``` ### Partition We can also use `gj` to partition a gtf-json in different ways. It takes a variable in the attributes and creates a new file for each category of that record and populates those files with the records matching that category. For example - we can write the GTF of every gene to a separate file: ``` bash # Partition on gene_name gj partition -i -o partitions/ -v gene_name # Partition of gene_id gj partition -i -o partitions/ -v gene_id # Partition of transcript_biotype gj partition -i -o partitions/ -v transcript_biotype ```