suffixsort

Crates.iosuffixsort
lib.rssuffixsort
version0.3.0
created_at2025-08-22 16:09:56.654685+00
updated_at2025-08-22 22:21:26.266035+00
descriptionLibrary for suffix (inverse lexicographic) sorting
homepage
repositoryhttps://github.com/suffixsort/suffixsort
max_upload_size
id1806559
size31,693
x48 (hex48)

documentation

README

suffixsort

A high-performance Rust library for inverse lexicographic (suffix) sorting, providing both high-level processing utilities and low-level comparison functions.

Features

  • Inverse Lexicographic Sorting: Compare strings from the last character towards the first
  • Flexible Configuration: Multiple sorting modes including dictionary order, case insensitivity, and reverse sorting
  • Unicode NFC Normalization: Optional Unicode normalization for consistent sorting of equivalent sequences
  • High Performance: Parallel processing using Rayon for handling large datasets efficiently
  • Dual API: Both high-level line processing and low-level comparator functions
  • Zero-Cost Abstractions: Minimal performance overhead through Rust's zero-cost abstractions
  • Optimized Defaults: Fastest performance with all options disabled by default

Performance Characteristics

suffixsort is designed for maximum performance with all options disabled (default configuration). The library is extremely fast by default, processing millions of lines per second on modern hardware.

Performance-Neutral Options

These options have minimal impact on performance:

  • reverse: Simply inverts comparison results
  • right_align: Adds padding during output formatting only
  • exclude_no_word: Simple filtering during processing
  • word_only: Affects output formatting only

Performance-Impactful Options

These options may reduce performance when enabled:

  • normalize: Unicode NFC normalization adds processing overhead during key extraction
  • stable: Stable sorting algorithms are generally slower than unstable variants
  • ignore_case: Case folding during key extraction adds minor overhead
  • dictionary_order: More complex key extraction logic
  • use_entire_line: Simpler key extraction but may use more memory

For maximum throughput with large datasets, use the default configuration (all options disabled).

Project Structure

This library is part of a workspace that includes:

  • core/: The suffixsort library crate (this crate)
  • cli/: The ssort command-line interface that uses this library

Installation

Add to your Cargo.toml:

[dependencies]
suffixsort = ">=0.1"

Usage

High-Level API

The main entry point is the SortConfig struct which allows you to configure and execute the sorting process:

use suffixsort::{SortConfig, ProcessedLine};

let config = SortConfig {
    ignore_case: true,
    reverse: false,
    dictionary_order: true,
    // ... other configuration options
    ..Default::default()
};

let lines = vec![
    "Apple".to_string(),
    "banana".to_string(),
    "Cherry".to_string(),
];

let (processed, padding_info) = config.process_lines(lines);

for line in processed {
    println!("{}", line.original);
}

Low-Level API

For advanced use cases, you can use the comparator function directly:

use suffixsort::SortConfig;
use std::cmp::Ordering;

let config = SortConfig {
    ignore_case: true,  // Note: this does not affect the low-level comparator!
    reverse: false,
    ..Default::default()
};

// The low-level comparator does not apply ignore_case or normalization.
// Users must pre-process strings if needed.
let comparer = config.get_comparer();
let mut words = vec!["Banana", "apple", "Cherry"];

// Use with standard sort
words.sort_by(|a, b| comparer(a, b));

// Or with parallel sort (requires Rayon)
use rayon::prelude::*;
words.par_sort_by(|a, b| comparer(a, b));

Configuration Options

The SortConfig struct provides these options:

  • ignore_case: Case-insensitive comparison (minimal performance impact, applied during key extraction)
  • use_entire_line: Use entire line instead of first word for sorting (simpler but may use more memory)
  • dictionary_order: Ignore non-alphabetic characters when finding first word (performance impact)
  • reverse: Reverse the sort order (performance-neutral)
  • stable: Use stable sorting algorithm (performance impact)
  • right_align: Right-align output with padding (performance-neutral)
  • exclude_no_word: Exclude lines without words (performance-neutral)
  • word_only: Output only the word used for sorting (performance-neutral)
  • normalize: Normalize Unicode to NFC form (performance impact)

Performance

The library is designed for high performance with large datasets:

  • Parallel processing using Rayon's work-stealing scheduler
  • Zero-copy operations where possible
  • Efficient character-by-character comparison
  • Minimal memory allocation
  • Optimized defaults for maximum throughput

Examples

See the cli directory for a complete command-line implementation using this library.

License

MIT OR Apache-2.0

Commit count: 21

cargo fmt