Exon is an analysis toolkit for life-science applications. It features:
* Support for many file formats from bioinformatics, proteomics, and others
* Local filesystem and object storage support
* Arrow FFI primitives for multi-language support
* SQL based access to bioinformatics data -- general DML and some DDL support
Please note Exon was recently excised from a larger library, so please be patient as we work to clean up after that. If you have a comment or question in the meantime, please file an issue.
* [Installation](#installation)
* [Usage](#usage)
* [File Formats](#file-formats)
* [Related Projects](#related-projects)
* [Settings](#settings)
* [Benchmarks](#benchmarks)
## Installation
Exon is available via crates.io. To install, run:
```bash
cargo add exon
```
## Usage
Exon is designed to be used as a library. For example, to read a FASTA file:
```rust
use exon::context::ExonSessionExt;
use datafusion::prelude::*;
use datafusion::error::Result;
let ctx = SessionContext::new_exon();
let df = ctx.read_fasta("test-data/datasources/fasta/test.fasta", None).await?;
```
Please see the [rust docs](https://docs.rs/exon) for more information.
## File Formats
| Format | Compression(s) | Inferred Extension(s) |
| --------- | -------------- | --------------------- |
| BAM | - | .bam |
| BCF | - | .bcf |
| BED | gz, zstd | .bed |
| FASTA | gz, zstd | .fasta, .fa, .fna |
| FASTQ | gz, zstd | .fastq, .fq |
| GENBANK | gz, zstd | .gbk, .genbank, .gb |
| GFF | gz, zstd | .gff |
| GTF | gz, zstd | .gtf |
| HMMDOMTAB | gz, zstd | .hmmdomtab |
| MZML | gz, zstd | .mzml[^2] |
| SAM | - | .sam |
| VCF | gz[^1] | .vcf |
[^1]: Uses bgzip not gzip.
[^2]: mzML also works.
## Related Projects
* [Exon R Bindings](./exon-r/README.md)
* [BioBear](https://www.github.com/wheretrue/biobear)
## Settings
Exon using the following settings:
| Setting | Default | Description |
| ------- | ------- | ----------- |
| `exon.vcf_parse_info` | `true` | Parse VCF INFO fields. If False, INFO fields will be returned as a single string. |
| `exon.vcf_parse_formats` | `true` | Parse VCF FORMAT fields. If False, FORMAT fields will be returned as a single string. |
You can update the settings by running:
```sql
SET = ;
```
For example, to disable parsing of VCF INFO fields:
```sql
SET exon.vcf_parse_info = false;
```
## Benchmarks
Please see the [benchmarks](exon-benchmarks) [README](exon-benchmarks/README.md) for more information.