succinctly

Crates.iosuccinctly
lib.rssuccinctly
version0.4.0
created_at2026-01-11 03:18:59.78516+00
updated_at2026-01-24 03:49:08.738792+00
descriptionHigh-performance succinct data structures for Rust
homepagehttps://github.com/rust-works/succinctly
repositoryhttps://github.com/rust-works/succinctly
max_upload_size
id2035145
size3,119,572
John Ky (newhoggy)

documentation

https://docs.rs/succinctly

README

succinctly

CI crates.io docs.rs License: MIT

High-performance succinct data structures for Rust.

Succinctly provides space-efficient data structures with fast rank and select operations, optimized for both x86_64 (with AVX2/AVX-512) and ARM (NEON) architectures. The library is no_std compatible and designed for high-throughput applications.

Features

  • Bitvector with O(1) rank and O(log n) select - Poppy-style 3-level directory with ~3% space overhead
  • Balanced parentheses for tree navigation - RangeMin structure with O(1) operations and ~6% overhead
  • JSON semi-indexing with SIMD acceleration - Up to 880 MiB/s throughput on x86_64 (AMD Zen 4) with table-driven PFSM parser
  • YAML semi-indexing - Complete YAML 1.2 parser with anchor/alias resolution (~250-400 MiB/s)
  • DSV/CSV semi-indexing - High-performance CSV/TSV parsing (85-1676 MiB/s) with BMI2 acceleration
  • jq/yq-style query expressions - Navigate JSON and YAML without full parsing
  • no_std compatible - Works in embedded and WASM environments
  • Cross-platform SIMD - Runtime detection for AVX2, AVX-512, SSE4.2, and ARM NEON

Installation

Add to your Cargo.toml:

[dependencies]
succinctly = "0.2"

Or with cargo:

cargo add succinctly

Quick Start

Bitvector with Rank/Select

use succinctly::{BitVec, RankSelect};

// Create a bitvector from u64 words
let words = vec![0b1010_1010_1010_1010u64; 8];
let bv = BitVec::from_words(words, 512);

// Query rank (count of 1-bits in [0, i))
assert_eq!(bv.rank1(8), 4);

// Query select (position of k-th 1-bit, 0-indexed)
assert_eq!(bv.select1(0), Some(1));  // First 1-bit is at position 1
assert_eq!(bv.select1(3), Some(7));  // Fourth 1-bit is at position 7

Balanced Parentheses for Tree Navigation

use succinctly::bp::BalancedParens;

// Encode a tree as balanced parentheses: ((()())())
// In bits: 1=open, 0=close -> 1110100100
let bp = BalancedParens::new(&[0b0010010111], 10);

// Find matching close parenthesis
assert_eq!(bp.find_close(0), Some(9));  // Outermost pair
assert_eq!(bp.find_close(1), Some(6));  // Second level

// Navigate the tree
assert_eq!(bp.first_child(0), Some(1));   // First child of root
assert_eq!(bp.next_sibling(1), Some(7));  // Sibling of first child

JSON Semi-Indexing

use succinctly::json::{JsonIndex, StandardJson};

let json = br#"{"users": [{"name": "Alice"}, {"name": "Bob"}]}"#;
let index = JsonIndex::build(json);
let root = index.root(json);

// Navigate without parsing the entire document
if let StandardJson::Object(obj) = root {
    if let Some(StandardJson::Array(users)) = obj.get("users") {
        // Iterate array elements
        for user in users {
            // Access nested fields efficiently
        }
    }
}

jq-Style Queries

use succinctly::jq::{parse, eval, QueryResult};
use succinctly::json::{JsonIndex, StandardJson};

let json = br#"{"users": [{"name": "Alice", "age": 30}, {"name": "Bob", "age": 25}]}"#;
let index = JsonIndex::build(json);
let cursor = index.root(json);

// Get all user names
let expr = parse(".users[].name").unwrap();
if let QueryResult::Many(names) = eval(&expr, cursor) {
    // names contains ["Alice", "Bob"]
}

// Get first user's age
let expr = parse(".users[0].age").unwrap();
if let QueryResult::One(StandardJson::Number(age)) = eval(&expr, cursor) {
    assert_eq!(age, 30.0);
}

Performance

Comparison with Rust JSON Parsers

Benchmark comparing succinctly against popular Rust JSON libraries (x86_64, 1MB file):

Library Parse Time Throughput Peak Memory vs succinctly
sonic-rs 1.00 ms 810 MiB/s 9.97 MB 1.6x faster, 26x more memory
succinctly 1.58 ms 510 MiB/s 382 KB baseline
serde_json 4.83 ms 167 MiB/s 7.00 MB 3.1x slower, 18x more memory
simd-json 5.10 ms 158 MiB/s 17.1 MB 3.2x slower, 46x more memory

Key findings:

  • 18-46x less memory than DOM parsers (consistent 46% overhead vs JSON size)
  • Competitive parsing speed: Only 1.6x slower than fastest DOM parser
  • 3-5x faster than serde_json and simd-json

See docs/benchmarks/rust-parsers.md for detailed benchmarks across all file sizes.

JSON Semi-Indexing

Platform Implementation Throughput Notes
x86_64 (AMD Ryzen 9 7950X) PFSM (BMI2) 679 MiB/s Table-driven, fastest on x86
AVX2 732 MiB/s 32 bytes/iteration
Scalar 494 MiB/s Baseline
ARM (Apple M1 Max) NEON 574 MiB/s 32 bytes/iteration
Scalar (PFSM) 557 MiB/s Table-driven
Scalar 405 MiB/s Baseline
ARM (Neoverse-V1) NEON/PFSM ~500 MiB/s Estimated based on jq perf

Rank/Select Operations

Platform Rank (O(1)) Select (O(log n))
x86_64 (AMD Ryzen 9 7950X) ~3 ns ~50 ns
ARM (Apple M1 Max) ~21 ns ~320 ns

jq Comparison: Identity Operation (.)

Comparison of succinctly jq . vs jq . for formatting/printing JSON files.

x86_64 (AMD Ryzen 9 7950X, 10MB files)

Pattern jq succinctly Speedup jq Mem succ Mem Mem Ratio
nested 147.8ms 54.5ms 2.7x 22 MB 26 MB 1.16x
strings 141.3ms 59.0ms 2.4x 14 MB 14 MB 1.02x
numbers 268.7ms 127.3ms 2.1x 87 MB 15 MB 0.18x
pathological 796.7ms 410.2ms 1.9x 472 MB 17 MB 0.04x
comprehensive 381.9ms 220.9ms 1.7x 149 MB 14 MB 0.09x
users 205.6ms 125.4ms 1.6x 68 MB 12 MB 0.18x
unicode 169.4ms 113.0ms 1.5x 31 MB 16 MB 0.51x
arrays 738.2ms 388.9ms 1.9x 367 MB 17 MB 0.05x
literals 279.5ms 228.7ms 1.2x 50 MB 16 MB 0.33x

ARM (Neoverse-V1, 1MB files)

Pattern jq succinctly Speedup
nested 23.8ms 8.8ms 2.7x
strings 21.2ms 9.4ms 2.3x
arrays 80.1ms 42.5ms 1.9x
users 28.8ms 17.4ms 1.7x
comprehensive 45.6ms 27.9ms 1.6x

ARM (Apple M1 Max, 10MB files)

Pattern jq succinctly Speedup jq Mem succ Mem Mem Ratio
nested 347.2ms 54.8ms 6.3x 25 MB 29 MB 1.17x
strings 331.0ms 75.2ms 4.4x 16 MB 17 MB 1.07x
pathological 1.35s 677.0ms 2.0x 526 MB 20 MB 0.04x
unicode 334.8ms 157.1ms 2.1x 41 MB 19 MB 0.46x
users 413.7ms 196.8ms 2.1x 70 MB 15 MB 0.21x
comprehensive 692.4ms 364.7ms 1.9x 135 MB 17 MB 0.13x
numbers 370.7ms 213.8ms 1.7x 97 MB 18 MB 0.19x
arrays 1.07s 700.8ms 1.5x 368 MB 20 MB 0.05x
literals 510.9ms 397.1ms 1.3x 103 MB 19 MB 0.19x

Key findings:

  • 1.2-6.3x faster across all patterns
  • 5-25x less memory on most patterns due to streaming lazy evaluation
  • Best speedup on nested/string-heavy data

Platform-Specific Optimizations

Platform Operation Throughput Speedup
x86_64 Popcount (AVX-512 VPOPCNTDQ) 96.8 GiB/s 5.2x vs scalar
ARM (M1 Max) NEON JSON (string-heavy) 3.7 GiB/s 1.69x vs scalar
ARM (Neoverse-V1) Popcount (NEON) 46.5 GiB/s N/A
NEON movemask (parallel) 1.37 ns 2.5x vs serial

See docs/benchmarks/rust-parsers.md, docs/benchmarks/jq.md, and docs/optimizations/history.md for detailed benchmarks.

Feature Flags

Popcount Strategies (mutually exclusive)

Feature Description
(default) Uses Rust's count_ones() which auto-vectorizes
simd. Explicit SIMD intrinsics (NEON on ARM, POPCNT on x86)
portable-popcount Portable bitwise algorithm (no intrinsics)

Other Features

Feature Description
std Enable std library (default, required for runtime CPU detection)
serde Enable serialization/deserialization support
cli Build the CLI tool
regex Enable regex support in jq queries

Test Features

Feature Description
large-tests Test with 1GB bitvectors (~125MB RAM)
huge-tests Test with 5GB bitvectors (~625MB RAM)
mmap-tests Memory-mapped file tests

CLI Tool

The library includes CLI tools for JSON, YAML, and DSV operations:

# Build the CLI
cargo build --release --features cli

# Generate synthetic JSON/YAML for benchmarking
./target/release/succinctly json generate 10mb -o benchmark.json
./target/release/succinctly yaml generate 10kb -o benchmark.yaml

# Query JSON files (jq-compatible)
./target/release/succinctly jq '.users[].name' input.json
./target/release/succinctly jq -r '.users[0]' input.json

# Query YAML files (yq-compatible)
./target/release/succinctly yq '.users[].name' config.yaml
./target/release/succinctly yq -o json '.' config.yaml          # Output as JSON
./target/release/succinctly yq '.spec.containers[]' k8s.yaml
./target/release/succinctly yq --doc 0 '.' multi-doc.yaml       # First document only

# Query CSV/TSV files (converted to JSON on-the-fly)
./target/release/succinctly jq --input-dsv ',' '.[] | .[0]' users.csv
./target/release/succinctly jq --input-dsv '\t' '.[0]' data.tsv

# Output as CSV/TSV/DSV
./target/release/succinctly jq -r '.users[] | [.name, .age] | @csv' data.json
./target/release/succinctly jq -r '.users[] | [.name, .age] | @dsv("|")' data.json
./target/release/succinctly yq -r '.users[] | [.name, .age] | @csv' data.yaml

# Find jq/yq expression for a position (useful for editor integration)
./target/release/succinctly jq-locate input.json --offset 42
./target/release/succinctly jq-locate input.json --line 5 --column 10
./target/release/succinctly yq-locate config.yaml --offset 42
./target/release/succinctly yq-locate config.yaml --line 5 --column 10

Architecture

Module Structure

succinctly
├── bits         # Bitvector with rank/select
├── trees        # Tree encodings (balanced parentheses)
├── json         # JSON semi-indexing
├── yaml         # YAML semi-indexing
├── dsv          # DSV/CSV semi-indexing
└── jq           # jq/yq query language evaluator

Core Data Structures

  • bits::BitVec - Bitvector with 3-level Poppy-style rank directory (~3% overhead) and sampled select index (~1-3% overhead)
  • trees::BalancedParens - Hierarchical min-excess structure for O(1) tree navigation (~6% overhead)
  • json::JsonIndex - Semi-index combining Interest Bits (IB) and Balanced Parentheses (BP) for fast JSON navigation
  • yaml::YamlIndex - YAML semi-index with anchor/alias resolution and multi-document support
  • dsv::DsvIndex - Lightweight DSV semi-index with BMI2-accelerated quote masking

SIMD Support

The library uses runtime CPU feature detection to select the best implementation:

Platform Features Used
x86_64 AVX2, AVX-512 VPOPCNTDQ, SSE4.2, SSE2
aarch64 NEON (mandatory)

Documentation

Choose your path:

For AI-assisted development, see CLAUDE.md.

Contributing

Contributions are welcome! Please see CONTRIBUTING.md for guidelines.

Before submitting a PR:

# Run tests
cargo test

# Run clippy
cargo clippy --all-targets --all-features -- -D warnings

# Format code
cargo fmt

License

Licensed under the MIT license.

Acknowledgments

This library implements algorithms from:

Related Work

  • haskell-works - Haskell implementations of the same techniques (hw-json, hw-json-simd, hw-rankselect, hw-balancedparens)
Commit count: 510

cargo fmt