Exon

Exon is an analysis toolkit for life-science applications. It features: * Support for many file formats from bioinformatics, proteomics, and others * Local filesystem and object storage support * Arrow FFI primitives for multi-language support * SQL based access to bioinformatics data -- general DML and some DDL support Please note Exon was recently excised from a larger library, so please be patient as we work to clean up after that. If you have a comment or question in the meantime, please file an issue. * [Installation](#installation) * [Usage](#usage) * [File Formats](#file-formats) * [Related Projects](#related-projects) * [Settings](#settings) * [Benchmarks](#benchmarks) ## Installation Exon is available via crates.io. To install, run: ```bash cargo add exon ``` ## Usage Exon is designed to be used as a library. For example, to read a FASTA file: ```rust use exon::context::ExonSessionExt; use datafusion::prelude::*; use datafusion::error::Result; let ctx = SessionContext::new_exon(); let df = ctx.read_fasta("test-data/datasources/fasta/test.fasta", None).await?; ``` Please see the [rust docs](https://docs.rs/exon) for more information. ## File Formats | Format | Compression(s) | Inferred Extension(s) | | --------- | -------------- | --------------------- | | BAM | - | .bam | | BCF | - | .bcf | | BED | gz, zstd | .bed | | FASTA | gz, zstd | .fasta, .fa, .fna | | FASTQ | gz, zstd | .fastq, .fq | | GENBANK | gz, zstd | .gbk, .genbank, .gb | | GFF | gz, zstd | .gff | | GTF | gz, zstd | .gtf | | HMMDOMTAB | gz, zstd | .hmmdomtab | | MZML | gz, zstd | .mzml[^2] | | SAM | - | .sam | | VCF | gz[^1] | .vcf | [^1]: Uses bgzip not gzip. [^2]: mzML also works. ## Related Projects * [Exon R Bindings](./exon-r/README.md) * [BioBear](https://www.github.com/wheretrue/biobear) ## Settings Exon using the following settings: | Setting | Default | Description | | ------- | ------- | ----------- | | `exon.vcf_parse_info` | `true` | Parse VCF INFO fields. If False, INFO fields will be returned as a single string. | | `exon.vcf_parse_formats` | `true` | Parse VCF FORMAT fields. If False, FORMAT fields will be returned as a single string. | You can update the settings by running: ```sql SET = ; ``` For example, to disable parsing of VCF INFO fields: ```sql SET exon.vcf_parse_info = false; ``` ## Benchmarks Please see the [benchmarks](exon-benchmarks) [README](exon-benchmarks/README.md) for more information.