nu_plugin_bio

Crates.ionu_plugin_bio
lib.rsnu_plugin_bio
version0.85.0
sourcesrc
created_at2022-10-25 20:29:41.674591
updated_at2023-10-18 10:30:40.822517
descriptionParse and manipulate common bioinformatic formats in nushell.
homepage
repositoryhttps://github.com/
max_upload_size
id697174
size1,361,169
Max Brown (Euphrasiologist)

documentation

README

Nushell bio

A bioinformatics plugin for nushell. This plugin parses most common bioinformatics formats into structured data so you can use them with nushell more effectively.

Quick setup

Go and get nushell, it's great. I'm assuming you have the rust toolchain installed. Then come back!

# clone this repo
git clone https://github.com/Euphrasiologist/nu_plugin_bio
# change into the repo directory
cd nu_plugin_bio
# build
# it's quite a long compile time...
cargo build --release
# register the plugin
register nu_plugin_bio/target/release/nu_plugin_bio

# see the current file formats currently supported below
# now you can just use open, and the file extension will be auto-detected.

# there are some test files in the tests/ dir.
open ./tests/test.fasta
    | get id

# if you want to add flags you have to explicitly use from <x>
# e.g. if you want descriptions in fasta files to be parsed.

open --raw ./tests/test.fasta 
    | from fasta -d
    | first

The backend is a noodles wrapper, an excellent, all-Rust bioinformatics I/O library.

Aims

Aim to support the following:

  • BAM 1.6
  • BCF 2.2
    • bcf.gz
  • VCF 4.3
    • vcf.gz
  • BED(3 only right now)
  • CRAM 3.0
  • FASTA
    • fa.gz
  • FASTQ
    • fq.gz
  • GFF3
  • GTF 2.2
  • SAM 1.6
  • GFA 1.0
    • gfa.gz

Note that performance will not be optimal with the current state of nu_plugin, as we cannot access the engine state of nushell, and therefore need to load entire data structures into memory. Testing still needs to be done on large files.

More?

If there's a bioinformatics format you want to add, let me know, or add a PR.

Commit count: 0

cargo fmt