Crates.io | swiftide |
lib.rs | swiftide |
version | 0.5.0 |
source | src |
created_at | 2024-06-13 12:18:55.919311 |
updated_at | 2024-07-15 16:06:55.472801 |
description | Blazing fast, streaming document and code indexation |
homepage | https://swiftide.rs |
repository | https://github.com/bosun-ai/swiftide-rs |
max_upload_size | |
id | 1270590 |
size | 176,142 |
Blazing fast data pipelines for Retrieval Augmented Generation written in Rust
Explore the docs »
API Docs
·
Report Bug
·
Request Feature
Swiftide is a data indexing and processing library, tailored for Retrieval Augmented Generation (RAG). When building applications with large language models (LLM), these LLMs need access to external resources. Data needs to be transformed, enriched, split up, embedded, and persisted. It is build in Rust, using parallel, asynchronous streams and is blazingly fast.
While working with other Python-based tooling, frustrations arose around performance, stability, and ease of use. Thus, Swiftide was born. Indexing performance went from tens of minutes to a few seconds.
Part of the bosun.ai project. An upcoming platform for autonomous code improvement.
We <3 feedback: project ideas, suggestions, and complaints are very welcome. Feel free to open an issue.
[!CAUTION] Swiftide is under heavy development and can have breaking changes while we work towards 1.0. Documentation here might fall short of all features, and despite our efforts be slightly outdated. Expect bugs. We recommend to always keep an eye on our github and api documentation. If you found an issue or have any kind of feedback we'd love to hear from you in an issue.
indexing::Pipeline::from_loader(FileLoader::new(".").with_extensions(&["rs"]))
.filter_cached(Redis::try_from_url(
redis_url,
"swiftide-examples",
)?)
.then(MetadataQACode::new(openai_client.clone()))
.then_chunk(ChunkCode::try_for_language_and_chunk_size(
"rust",
10..2048,
)?)
.then_in_batch(10, Embed::new(openai_client.clone()))
.then_store_with(
Qdrant::try_from_url(qdrant_url)?
.batch_size(50)
.vector_size(1536)
.collection_name("swiftide-examples".to_string())
.build()?,
)
.run()
.await?;
You can find more examples in /examples
tracing
supported for logging and tracing, see /examples and the tracing
crate for more information.Our goal is to create afast, extendable platform for data indexing and querying to further the development of automated LLM applications, with an easy-to-use and easy-to-extend api.
Make sure you have the rust toolchain installed. rustup Is the recommended approach.
To use OpenAI, an API key is required. Note that by default async_openai
uses the OPENAI_API_KEY
environment variables.
Other integrations will need to be installed accordingly.
Set up a new Rust project
Add swiftide
cargo add swiftide
Enable the features of integrations you would like to have or use 'all' in your Cargo.toml
Write a pipeline (see our examples and documentation)
Before building your stream, you need to enable and configure any integrations required. See /examples.
A stream starts with a Loader that emits Nodes. For instance, with the Fileloader each file is a Node.
You can then slice and dice, augment, and filter nodes. Each different kind of step in the pipeline requires different traits. This enables extension.
Nodes have a path, chunk and metadata. Currently metadata is copied over when chunking and always embedded when using the OpenAIEmbed transformer.
(impl Loader)
starting point of the stream, creates and emits Nodes(impl NodeCache)
filters cached nodes(impl Transformer)
transforms the node and puts it on the stream(impl BatchTransformer)
transforms multiple nodes and puts them on the stream(impl ChunkerTransformer)
transforms a single node and emits multiple nodes(impl Storage)
stores the nodes in a storage backend, this can be chainedAdditionally, several generic transformers are implemented. They take implementers of SimplePrompt
and EmbedModel
to do their things.
[!NOTE] No integrations are enabled by default as some are code heavy. Either cherry-pick the integrations you need or use the "all" feature flag.
[!WARNING] Due to the performance, chunking before adding metadata gives rate limit errors on OpenAI very fast, especially with faster models like 3.5-turbo. Be aware.
For more examples, please refer to /examples and the Documentation
See the open issues for a full list of proposed features (and known issues).
Swiftide is in a very early stage and we are aware that we lack features for the wider community. Contributions are very welcome. :tada:
If you have a great idea, please fork the repo and create a pull request. You can also simply open an issue with the tag "enhancement". Don't forget to give the project a star! Thanks again!
If you just want to contribute (bless you!), see our issues.
git checkout -b feature/AmazingFeature
)git commit -m 'feat: Add some AmazingFeature'
)git push origin feature/AmazingFeature
)See CONTRIBUTING for more
Distributed under the MIT License. See LICENSE
for more information.