Crates.io | geo-index |
lib.rs | geo-index |
version | 0.1.1 |
source | src |
created_at | 2024-01-15 04:42:51.944282 |
updated_at | 2024-01-15 04:54:37.008108 |
description | Fast, static, ABI-stable spatial indexes. |
homepage | |
repository | https://github.com/kylebarron/geo-index |
max_upload_size | |
id | 1100052 |
size | 107,859 |
A Rust crate for packed, static, zero-copy spatial indexes.
rstar
.Vec<u8>
, compatible with the flatbush
and kdbush
JavaScript libraries. Being ABI-stable means that the spatial index can be shared zero-copy between Rust and another program like Python.i8
, u8
, i16
, u16
, i32
, u32
, f32
, f64
.rayon
feature for parallelizing part of the sort in the sort-tile-recursive (STRSort
) method.u64
.rstar
: a dynamic RTree implementation.kdtree
: a dynamic KDTree implementation.kdbush
: a port of kdbush
but does not strive for FFI-compatibility.static_aabb2d_index
: a port of flatbush
but does not strive for FFI-compatibility.static-bushes
: a port of flatbush
and kdbush
but does not strive for FFI-compatibility.@mourner's amazing flatbush
and kdbush
libraries are the fastest JavaScript libraries for static R-trees and k-d trees. Part of their appeal in the browser is that they're backed by a single, contiguous buffer, and thus can be moved from the main thread to another thread (a web worker) without any copies.
By porting and expanding on those JavaScript libraries and ensuring that the internal memory layout is exactly maintained, we can bootstrap zero-copy use cases both inside and outside the browser. In-browser use cases can interop between a rust-based WebAssembly module and the upstream JS libraries without copies. Outside-browser use cases can interop between multiple Rust libraries or between Rust and Python without copies.
I'm excited about Rust to speed up Python and JavaScript via compiled extension modules. It's true that you can create Python bindings to a Rust library, have Rust manage the memory, and never need to worry about zero-copy data structures. But when someone else writes a C library that would like to interface with your data, if you don't have an ABI-stable way to share the data, you need to serialize it and they need to deserialize it, which is costly.
For example, in Python, Shapely (and by extension the C library GEOS) is used for most geospatial data storage. But separate Python libraries with C extensions can't use the same GEOS memory because the underlying storage isn't ABI-stable. So there has to be a serde step in between.
GeoArrow solves this problem for geospatial vector data, because it defines a language-independent, ABI-stable memory layout. So you can safely move memory between Python/Rust/C just by exchanging pointer information.
But it's very useful to be able to share large spatial data, declare that the data is already spatially ordered, and share a spatial index, all at no cost.
Currently, this library is used under the hood in geoarrow-rs
to speed up boolean operations and spatial joins. But over a medium-term time horizon, I believe that exposing the raw index data will enable exciting FFI use cases that are presently impossible.
This is just one benchmark; I recommend benchmarking with your own data, but this indicates construction is ~2x faster than rstar
and search is ~33% faster.
cargo bench --bench rtree
construction (geo-index, hilbert)
time: [80.503 ms 80.891 ms 81.350 ms]
construction (geo-index, STRTree)
time: [115.60 ms 116.52 ms 117.64 ms]
construction (geo-index, hilbert, f64 to f32, including casting)
time: [86.409 ms 86.681 ms 86.984 ms]
construction (geo-index f32)
time: [78.292 ms 78.393 ms 78.514 ms]
construction (rstar bulk)
time: [158.48 ms 159.34 ms 160.29 ms]
search (flatbush) time: [115.97 µs 116.41 µs 116.86 µs]
search (flatbush STRTree)
time: [115.85 µs 117.57 µs 118.95 µs]
search (flatbush f32) time: [113.04 µs 114.56 µs 115.99 µs]
search (rstar) time: [151.53 µs 153.62 µs 155.84 µs]
With the rayon
feature, the sorting phase of the STRTree
is faster:
cargo bench --bench rtree --features rayon
construction (geo-index, STRTree)
time: [71.825 ms 72.099 ms 72.382 ms]
change: [-38.738% -38.125% -37.570%] (p = 0.00 < 0.05)
Performance has improved.