Crates.io | sif-embedding |
lib.rs | sif-embedding |
version | 0.6.1 |
source | src |
created_at | 2023-05-05 16:40:34.636627 |
updated_at | 2023-12-08 17:17:44.36442 |
description | Smooth inverse frequency (SIF), a simple but powerful embedding technique for sentences |
homepage | https://github.com/kampersanda/sif-embedding |
repository | https://github.com/kampersanda/sif-embedding |
max_upload_size | |
id | 857905 |
size | 84,153 |
This is a Rust implementation of simple but powerful sentence embedding algorithms based on SIF and uSIF described in the following papers:
This library will help you if
https://docs.rs/sif-embedding/
See tutorial.
benchmarks provides speed benchmarks.
We observed that, with an English Wikipedia dataset, our SIF implementation could process ~80K sentences per second on MacBook Air (one core of Apple M2, 24 GB RAM).
evaluations provides tools to evaluate sif-embedding on several similarity evaluation tasks.
evaluations/senteval provides evaluation tools and results for SentEval STS/SICK Tasks.
As one example, the following table shows the evaluation results with the Spearman's rank correlation coefficient for the STS-Benchmark.
Model | train | dev | test | Avg. |
---|---|---|---|---|
sif_embedding::Sif | 65.2 | 75.3 | 63.6 | 68.0 |
sif_embedding::USif | 68.0 | 78.2 | 66.3 | 70.8 |
princeton-nlp/unsup-simcse-bert-base-uncased | 76.9 | 81.7 | 76.5 | 78.4 |
princeton-nlp/sup-simcse-bert-base-uncased | 83.3 | 86.2 | 84.3 | 84.6 |
eveluations/japanese provides evaluation tools and results for JGLUE JSTS and JSICK tasks.
As one example, the following table shows the evaluation results with the Spearman's rank correlation coefficient.
Model | JSICK (test) | JSTS (train) | JSTS (val) | Avg. |
---|---|---|---|---|
sif_embedding::Sif | 79.7 | 67.6 | 74.6 | 74.0 |
sif_embedding::USif | 79.7 | 69.3 | 76.0 | 75.0 |
cl-nagoya/unsup-simcse-ja-base | 79.0 | 74.5 | 79.0 | 77.5 |
cl-nagoya/unsup-simcse-ja-large | 79.6 | 77.8 | 81.4 | 79.6 |
cl-nagoya/sup-simcse-ja-base | 82.8 | 77.9 | 80.9 | 80.5 |
cl-nagoya/sup-simcse-ja-large | 83.1 | 79.6 | 83.1 | 81.9 |
qdrant-examples provides an example of using sif-embedding with qdrant/rust-client.
Trouble shooting: Tips on how to resolve errors I faced in my environment.
Licensed under either of
at your option.