# BGE Small English Embedding Library

This Rust library provides an interface for generating embeddings using the BGE Small English v1.5 model from Hugging
Face, specifically designed for dense retrieval applications. The model, part of the FlagEmbedding project, focuses on
retrieval-augmented LLMs and offers state-of-the-art performance for embedding generation.

Rust docs: https://docs.rs/bge/latest/bge/struct.Bge.html
Crates.io: https://crates.io/crates/bge

## Features

- Load and use the BGE Small English v1.5 model for embedding generation.
- Normalize embeddings for comparison.
- Handle large inputs and errors gracefully.

## Model Reference

The BGE Small English v1.5 model is available on Hugging
Face: [https://huggingface.co/BAAI/bge-small-en-v1.5](https://huggingface.co/BAAI/bge-small-en-v1.5). This model is part
of the FlagEmbedding project, which includes various tools and models for retrieval-augmented LLMs. For more details,
visit the [FlagEmbedding GitHub](https://github.com/flagembedding).

## Getting Started

To use this library, you will first need to download the necessary model and tokenizer files from Hugging Face:

- Tokenizer
  file: [tokenizer.json](https://huggingface.co/BAAI/bge-small-en-v1.5/blob/main/tokenizer.json)
- Model file: [model.onnx](https://huggingface.co/BAAI/bge-small-en-v1.5/blob/main/onnx/model.onnx)

These files should be saved in a known directory on your local machine.

## Installation

Ensure Rust is installed on your system. Then, add this library to your project's `Cargo.toml` file.

## Including `bge` in Your Project

To use `bge` in your project, add the following to your `Cargo.toml` file:

```toml
[dependencies]
bge = "0.1.0"

# If your project requires `ort` binaries to be automatically downloaded, include `ort` with the `download-binaries` feature enabled:
ort = { version = "2.0.0-rc.1", default-features = false, features = ["download-binaries"] }
```

## Usage

### Loading the Model

First, initialize the `Bge` struct with the paths to the tokenizer and model files:

```rust
let bge = Bge::from_files("path/to/tokenizer.json", "path/to/model.onnx").unwrap();
```

### Generating Embeddings

To generate embeddings for a given input text:

```rust
let input_text = "Your input text here.";
let embeddings = bge.create_embeddings(input_text).unwrap();
println!("Embeddings: {:?}", embeddings);
```

This will print the embeddings generated by the model for the input text.

## Handling Errors

The library can return errors in several scenarios, such as when the input exceeds the model's token limit or if there
are issues loading the model. It's recommended to handle these errors appropriately in your application.

## Contribution

Contributions to this library are welcome. If you encounter any issues or have suggestions for improvements, please open
an issue or submit a pull request.

## License

This library is licensed under the MIT License. The BGE models provided by Hugging Face can be used for commercial
purposes free of charge.

---