[![Actions Status](https://github.com/creativcoder/avrow/workflows/ci/badge.svg)](https://github.com/creativcoder/avrow/actions)
[![crates](https://img.shields.io/crates/v/avrow.svg)](https://crates.io/crates/avrow)
[![docs.rs](https://docs.rs/avrow/badge.svg)](https://docs.rs/avrow/)
[![license](https://img.shields.io/badge/License-MIT-blue.svg)](https://github.com/creativcoder/avrow/blob/master/LICENSE-MIT)
[![license](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://github.com/creativcoder/avrow/blob/master/LICENSE-APACHE)
[![Contributor Covenant](https://img.shields.io/badge/contributor%20covenant-v1.4%20adopted-ff69b4.svg)](CODE_OF_CONDUCT.md)
### Avrow is a pure Rust implementation of the [Avro specification](https://avro.apache.org/docs/current/spec.html) with [Serde](https://github.com/serde-rs/serde) support.
### Table of Contents
- [Overview](#overview)
- [Features](#features)
- [Getting started](#getting-started)
- [Examples](#examples)
- [Writing avro data](#writing-avro-data)
- [Reading avro data](#reading-avro-data)
- [Writer builder](#writer-customization)
- [Supported Codecs](#supported-codecs)
- [Using the avrow-cli tool](#using-avrow-cli-tool)
- [Benchmarks](#benchmarks)
- [Todo](#todo)
- [Changelog](#changelog)
- [Contributions](#contributions)
- [Support](#support)
- [MSRV](#msrv)
- [License](#license)
## Overview
Avrow is a pure Rust implementation of the [Avro specification](https://avro.apache.org/docs/current/spec.html): a row based data serialization system. The Avro data serialization format finds its use quite a lot in big data streaming systems such as [Kafka](https://kafka.apache.org/) and [Spark](https://spark.apache.org/).
Within avro's context, an avro encoded file or byte stream is called a "data file".
To write data in avro encoded format, one needs a schema which is provided in json format. Here's an example of an avro schema represented in json:
```json
{
"type": "record",
"name": "LongList",
"aliases": ["LinkedLongs"],
"fields" : [
{"name": "value", "type": "long"},
{"name": "next", "type": ["null", "LongList"]}
]
}
```
The above schema is of type record with fields and represents a linked list of 64-bit integers. In most implementations, this schema is then fed to a `Writer` instance along with a buffer to write encoded data to. One can then call one
of the `write` methods on the writer to write data. One distinguishing aspect of avro is that the schema for the encoded data is written on the header of the data file. This means that for reading data you don't need to provide a schema to a `Reader` instance. The spec also allows providing a reader schema to filter data when reading.
The Avro specification provides two kinds of encoding:
* Binary encoding - Efficent and takes less space on disk.
* JSON encoding - When you want a readable version of avro encoded data. Also used for debugging purposes.
This crate implements only the binary encoding as that's the format practically used for performance and storage reasons.
## Features
* Full support for recursive self-referential schemas with Serde serialization/deserialization.
* All compressions codecs (`deflate`, `bzip2`, `snappy`, `xz`, `zstd`) supported as per spec.
* Simple and intuitive API - As the underlying structures in use are `Read` and `Write` types, avrow tries to mimic the same APIs as Rust's standard library APIs for minimal learning overhead. Writing avro values is simply calling `write` or `serialize` (with serde) and reading avro values is simply using iterators.
* Less bloat / Lightweight - Compile times in Rust are costly. Avrow tries to use minimal third-party crates. Compression codec and schema fingerprinting support are feature gated by default. To use them, compile with respective feature flags (e.g. `--features zstd`).
* Schema evolution - One can configure the avrow `Reader` with a reader schema and only read data relevant to their use case.
* Schema's in avrow supports querying their canonical form and have fingerprinting (`rabin64`, `sha256`, `md5`) support.
**Note**: This is not a complete spec implemention and remaining features being implemented are listed under [Todo](#todo) section.
## Getting started:
Add avrow as a dependency to `Cargo.toml`:
```toml
[dependencies]
avrow = "0.2.0"
```
## Examples:
### Writing avro data
```rust
use anyhow::Error;
use avrow::{Schema, Writer};
use std::str::FromStr;
fn main() -> Result<(), Error> {
// Create schema from json
let schema = Schema::from_str(r##"{"type":"string"}"##)?;
// or from a path
let schema2 = Schema::from_path("./string_schema.avsc")?;
// Create an output stream
let stream = Vec::new();
// Create a writer
let writer = Writer::new(&schema, stream.as_slice())?;
// Write your data!
let res = writer.write("Hey")?;
// or using serialize method for serde derived types.
let res = writer.serialize("there!")?;
Ok(())
}
```
For simple and native Rust types, avrow provides a `From` impl to convert to Avro value types. For compound or user defined types (structs or enums), one can use the `serialize` method which relies on serde. Alternatively, one can construct `avrow::Value` instances which is a more verbose way to write avro values and should be a last resort.
### Reading avro data
```rust
fn main() -> Result<(), Error> {
let schema = Schema::from_str(r##""null""##);
let data = vec![
79, 98, 106, 1, 4, 22, 97, 118, 114, 111, 46, 115, 99, 104, 101,
109, 97, 32, 123, 34, 116, 121, 112, 101, 34, 58, 34, 98, 121, 116,
101, 115, 34, 125, 20, 97, 118, 114, 111, 46, 99, 111, 100, 101,
99, 14, 100, 101, 102, 108, 97, 116, 101, 0, 145, 85, 112, 15, 87,
201, 208, 26, 183, 148, 48, 236, 212, 250, 38, 208, 2, 18, 227, 97,
96, 100, 98, 102, 97, 5, 0, 145, 85, 112, 15, 87, 201, 208, 26,
183, 148, 48, 236, 212, 250, 38, 208,
];
// Create a Reader
let reader = Reader::with_schema(v.as_slice(), &schema)?;
for i in reader {
dbg!(&i);
}
Ok(())
}
```
### Self-referential recursive schema example
```rust
use anyhow::Error;
use avrow::{from_value, Codec, Reader, Schema, Writer};
use serde::{Deserialize, Serialize};
#[derive(Debug, Serialize, Deserialize)]
struct LongList {
value: i64,
next: Option