Crates.io | qdrant-datafusion |
lib.rs | qdrant-datafusion |
version | 0.1.1 |
created_at | 2025-08-24 20:20:18.242586+00 |
updated_at | 2025-08-24 22:07:08.317107+00 |
description | Qdrant integration for Apache DataFusion |
homepage | https://github.com/georgeleepatterson/qdrant-datafusion |
repository | https://github.com/georgeleepatterson/qdrant-datafusion |
max_upload_size | |
id | 1808753 |
size | 277,356 |
Qdrant
DataFusion
IntegrationA high-performance Rust library that provides seamless integration between Qdrant vector database and Apache DataFusion, enabling SQL queries over vector data with full support for heterogeneous collections, complex projections, and mixed vector types.
List<Float32>
List<List<Float32>>
Add this to your Cargo.toml
:
[dependencies]
qdrant-datafusion = "0.1"
use qdrant_datafusion::prelude::*;
use qdrant_client::Qdrant;
use datafusion::prelude::*;
#[tokio::main]
async fn main() -> Result<()> {
// Connect to Qdrant
let client = Qdrant::from_url("http://localhost:6334").build()?;
// Create DataFusion table provider
let table_provider = QdrantTableProvider::try_new(client, "my_collection").await?;
// Register with DataFusion context
let ctx = SessionContext::new();
ctx.register_table("vectors", Arc::new(table_provider))?;
// Query with SQL!
let df = ctx.sql("
SELECT id, payload, embedding
FROM vectors
WHERE id IN ('doc1', 'doc2')
LIMIT 10
").await?;
let results = df.collect().await?;
println!("{:?}", results);
Ok(())
}
// Complex projections with mixed vector types
let df = ctx.sql("
SELECT
id,
text_embedding,
image_embedding,
multi_embeddings,
keywords_indices,
keywords_values
FROM mixed_vectors
WHERE payload IS NOT NULL
").await?;
// Efficient schema projection - only fetches requested vector fields
let df = ctx.sql("SELECT text_embedding FROM vectors").await?;
Vector Type | Schema | Description | Example Query |
---|---|---|---|
Dense | List<Float32> |
Single embedding per field | SELECT text_embedding FROM docs |
Multi | List<List<Float32>> |
Multiple embeddings per field | SELECT multi_embeddings FROM docs |
Sparse | List<UInt32> + List<Float32> |
Efficient sparse vectors | SELECT keywords_indices, keywords_values FROM docs |
Collections with multiple named vector fields where different points can have different subsets:
-- Schema automatically includes all possible vector fields
SELECT
id,
text_embedding, -- Some points have this
image_embedding, -- Some points have this
audio_embedding -- Some points have this
FROM heterogeneous_collection;
Collections with a single unnamed vector field:
-- Schema contains single 'vector' field
SELECT id, payload, vector
FROM homogeneous_collection;
โ
Complete TableProvider
Implementation
DataFusion
โ Production Ready
๐ In Development
๐ฏ Planned
DataFusion
sourcessimilarity()
, recommend()
, discover()
like functionsRun the test suite with a real Qdrant instance:
# Start Qdrant
docker run -p 6333:6333 -p 6334:6334 qdrant/qdrant
# Run tests
cargo test --features test-utils
# Check coverage
just coverage
Built around a schema-driven architecture that reduces complex matching logic and leaves room for future expansion:
// Schema defines extractors upfront
enum FieldExtractor {
Id(StringBuilder),
Payload(StringBuilder),
DenseVector { name: String, builder: ListBuilder<Float32Builder> },
MultiVector { name: String, builder: ListBuilder<ListBuilder<Float32Builder>> },
SparseIndices { name: String, builder: ListBuilder<UInt32Builder> },
SparseValues { name: String, builder: ListBuilder<Float32Builder> },
}
// Single pass processing with owned iteration
pub fn append_point(&mut self, point: ScoredPoint) {
let ScoredPoint { id, payload, vectors, .. } = point;
let vector_lookup = build_vector_lookup(vectors);
for extractor in &mut self.field_extractors {
// All logic inline - no hidden abstractions
}
}
We welcome contributions! Please see CONTRIBUTING.md for guidelines.
Licensed under the Apache License, Version 2.0. See LICENSE for details.