Rust server plugin for deploying deep learning models with batched prediction.
Deep learning models are usually implemented to make efficient use of a GPU by batching inputs together
in "mini-batches". However, applications serving these models often receive requests one-by-one.
So using a conventional single or multi-threaded server approach will under-utilize the GPU and lead to latency that increases
linearly with the volume of requests.
`batched-fn` is a drop-in solution for deep learning webservers that queues individual requests and provides them as a batch
to your model. It can be added to any application with minimal refactoring simply by inserting the [`batched_fn`](https://docs.rs/batched-fn/latest/batched_fn/macro.batched_fn.html)
macro into the function that runs requests through the model.
## Features
- 🚀 Easy to use: drop the `batched_fn!` macro into existing code.
- 🔥 Lightweight and fast: queue system implemented on top of the blazingly fast [flume crate](https://github.com/zesterer/flume).
- 🙌 Easy to tune: simply adjust [`max_delay`](https://docs.rs/batched-fn/latest/batched_fn/macro.batched_fn.html#config) and [`max_batch_size`](https://docs.rs/batched-fn/latest/batched_fn/macro.batched_fn.html#config).
- 🛑 [Back pressure](https://medium.com/@jayphelps/backpressure-explained-the-flow-of-data-through-software-2350b3e77ce7) mechanism included:
just set [`channel_cap`](https://docs.rs/batched-fn/latest/batched_fn/macro.batched_fn.html#config) and handle
[`Error::Full`](https://docs.rs/batched-fn/latest/batched_fn/enum.Error.html#variant.Full) by returning a 503 from your webserver.
## Examples
Suppose you have a model API that look like this:
```rust
// `Batch` could be anything that implements the `batched_fn::Batch` trait.
type Batch = Vec;
#[derive(Debug)]
struct Input {
// ...
}
#[derive(Debug)]
struct Output {
// ...
}
struct Model {
// ...
}
impl Model {
fn predict(&self, batch: Batch) -> Batch