edgebert

Crates.ioedgebert
lib.rsedgebert
version0.3.4
created_at2025-08-25 23:09:00.73877+00
updated_at2025-09-12 10:22:28.82275+00
descriptionFast local text embeddings library for Rust and WASM for BERT inference on native and edge devices with no dependencies.
homepage
repositoryhttps://github.com/olafurjohannssson/edgebert
max_upload_size
id1810282
size247,666
Ólafur Aron Jóhannsson (olafurjohannsson)

documentation

README

EdgeBERT

A pure Rust + WASM implementation for BERT inference with minimal dependencies

crates.io docs.rs Build Status


Overview

EdgeBERT is a lightweight Rust implementation of BERT inference focused on native, edge computing and WASM deployment. Run sentence-transformers models anywhere without Python or large ML runtimes.

Status

  • ✅ MiniLM-L6-v2 inference working
  • ✅ WASM support
  • ⚠️ Performance optimization ongoing
  • 🚧 Additional models coming soon

Contributions

All contributions welcome, this is very early stages.

Components

  • Encoder: Run inference to turn text into embeddings
  • WordPiece tokenization: A small tokenization implementation based on WordPiece
  • Cross-Platform (WebAssembly and native)
  • No Python or C/C++ dependencies except for OpenBLAS for ndarray vectorized matrix operations

Use this if you need:

  • Embeddings in pure Rust without Python/C++ dependencies
  • BERT in browsers or edge devices
  • Offline RAG systems
  • Small binary size over maximum speed

Don't use this if you need:

  • Multiple model architectures (currently only MiniLM)
  • GPU acceleration
  • Production-grade performance (use ONNX Runtime instead)

Getting Started

1. Native Rust Application

For server-side or desktop applications, you can use the library directly.

Cargo.toml

[dependencies]
edgebert = "0.3.0"
anyhow = "1.0"

main.rs

use anyhow::Result;
use edgebert::{Model, ModelType};
fn main() -> Result<()> {
    let model = Model::from_pretrained(ModelType::MiniLML6V2)?;

    let texts = vec!["Hello world", "How are you"];
    let embeddings = model.encode(texts.clone(), true)?;

    for (i, embedding) in embeddings.iter().enumerate() {
        let n = embedding.len().min(10);
        println!("Text: {} == {:?}...", texts[i], &embedding[0..n]);
    }
    Ok(())
}

Output:

Text: Hello world == [-0.034439795, 0.030909885, 0.0066969804, 0.02608013, -0.03936993, -0.16037229, 0.06694216, -0.0065279473, -0.0474657, 0.014813968]...
Text: How are you == [-0.031447295, 0.03784213, 0.0761843, 0.045665547, -0.0012263817, -0.07488511, 0.08155286, 0.010209872, -0.11220472, 0.04075747]...

You can see the full example under examples/basic.rs - to build and run:

cargo run --release --example basic

2. WebAssembly

# Install wasm-pack
curl https://rustwasm.github.io/wasm-pack/installer/init.sh -sSf | sh

# Build
./scripts/wasm-build.sh

# Serve
cd examples && npx serve
<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <title>EdgeBERT WASM Test</title>
</head>
<body>
<script type="module">
    import init, { WasmModel, WasmModelType } from './pkg/edgebert.js';

    async function run() {
        await init();

        const model = await WasmModel.from_type(WasmModelType.MiniLML6V2);
        const texts = ["Hello world", "How are you"];
        const embeddings = model.encode(texts, true);

        console.log("First 10 values:", embeddings.slice(0, 10));
    }

    run().catch(console.error);
</script>
</body>
</html>

Output:

First 10 values: Float32Array(10) [-0.034439802169799805, 0.03090989589691162, 0.006696964148432016, 0.02608015574514866, -0.03936990723013878, -0.16037224233150482, 0.06694218516349792, -0.006527911406010389, -0.04746570065617561, 0.014813981018960476, buffer: ArrayBuffer(40), byteLength: 40, byteOffset: 0, length: 10, Symbol(Symbol.toStringTag): 'Float32Array']

You can see the full example under examples/basic.html - to build run scripts/wasm-build.sh and go into examples/ and run a local server, npx serve can serve wasm.

3. Web Workers

You can look at examples/worker.html and examples/worker.js to see how to use web workers and web assembly, the library handles both when window is defined, as with basic.html and also when it is not, web workers.

After compiling the WASM build, if you used the wasm-build.sh it should be inside examples/pkg, use npx serve and open localhost:3000/worker

Clicking on generate embeddings after the model loads generates

Encoding texts: ["Hello world","How are you?"]
Embeddings shape: [2, 384]
'Hello world' vs 'How are you?': 0.305
First embedding norm: 1.000000
First 10 values: [-0.0344, 0.0309, 0.0067, 0.0261, -0.0394, -0.1604, 0.0669, -0.0065, -0.0475, 0.0148]

4. Comparison with PyTorch

Small example from pytorch to encode and show similarity

from sentence_transformers import SentenceTransformer
import torch
import torch.nn.functional as F

texts = ["Hello world", "How are you"]

model = SentenceTransformer('all-MiniLM-L6-v2')

embeddings = model.encode(texts, convert_to_tensor=True)

for i, emb in enumerate(embeddings):
    print(f"Text: {texts[i]}")
    print("First 10 values:", emb[:10].tolist())
    print()

cos_sim = F.cosine_similarity(embeddings[0], embeddings[1], dim=0)
print(f"Cosine similarity ('{texts[0]}' vs '{texts[1]}'):", cos_sim.item())

Output from Python:

Text: Hello world == ['-0.0345', '0.0310', '0.0067', '0.0261', '-0.0394', '-0.1603', '0.0669', '-0.0064', '-0.0475', '0.0148']...
Text: How are you == ['-0.0314', '0.0378', '0.0763', '0.0457', '-0.0012', '-0.0748', '0.0816', '0.0102', '-0.1122', '0.0407']...

Cosine similarity ('Hello world' vs 'How are you'): 0.3624

EdgeBERT

use anyhow::Result;
use edgebert::{Model, ModelType};

fn main() -> Result<()> {
    let model = Model::from_pretrained(ModelType::MiniLML6V2)?;
    let texts = vec!["Hello world", "How are you"];
    let embeddings = model.encode(texts.clone(), true)?;

    for (i, embedding) in embeddings.iter().enumerate() {
        let n = embedding.len().min(10);
        println!("Text: {} == {:?}...", texts[i], &embedding[0..n]);
    }

    let dot: f32 = embeddings[0].iter().zip(&embeddings[1]).map(|(a, b)| a * b).sum();
    let norm_a: f32 = embeddings[0].iter().map(|v| v * v).sum::<f32>().sqrt();
    let norm_b: f32 = embeddings[1].iter().map(|v| v * v).sum::<f32>().sqrt();
    let cos_sim = dot / (norm_a * norm_b);

    println!("\nCosine similarity ('{}' vs '{}'): {:.4}", texts[0], texts[1], cos_sim);

    Ok(())
}

Output

Text: Hello world == [-0.034439795, 0.030909885, 0.0066969804, 0.02608013, -0.03936993, -0.16037229, 0.06694216, -0.0065279473, -0.0474657, 0.014813968]...
Text: How are you == [-0.031447295, 0.03784213, 0.0761843, 0.045665547, -0.0012263817, -0.07488511, 0.08155286, 0.010209872, -0.11220472, 0.04075747]...

Cosine similarity ('Hello world' vs 'How are you'): 0.3623

Cosine similarity has 0.3623 for Rust and Python 0.3624, acceptable with around 99.97% accuracy, tiny discrepancy because of floating point rounding differences.

Commit count: 0

cargo fmt