tilesort

Crates.io	tilesort
lib.rs	tilesort
version	0.2.0
created_at	2025-11-10 02:12:50.854748+00
updated_at	2025-11-18 04:34:18.682118+00
description	A sorting algorithm optimized for datasets with pre-sorted contiguous blocks (tiles)
homepage	https://github.com/evanjpw/tilesort
repository	https://github.com/evanjpw/tilesort
max_upload_size
id	1924658
size	215,009

Evan Williams (evanjpw)

documentation

README

Tilesort

A sorting algorithm optimized for datasets with pre-sorted contiguous blocks (tiles).

Overview

Tilesort is a specialized sorting algorithm that achieves high performance when your data consists of non-overlapping, pre-sorted contiguous blocks. Instead of sorting individual elements, tilesort identifies these "tiles" and arranges them as discrete units.

When to Use Tilesort

Tilesort is particularly effective when:

Data arrives in pre-sorted chunks or batches
Data structures maintain sorted regions
Distributed systems produce sorted shards that need merging
Merging sorted log files or event streams
Processing time-series data with sorted segments
You have k tiles in a dataset of n elements where k << n

When NOT to Use Tilesort

Do not use tilesort if:

Your data is randomly shuffled - Tilesort has O(n²) worst-case complexity when there are no pre-sorted tiles. Use standard sort algorithms instead.
You don't know if your data has tiles - If your data doesn't have pre-sorted regions, tilesort will be significantly slower than standard sorting.
The number of tiles approaches the number of elements (k ≈ n) - The overhead of tile detection and management provides no benefit when most elements are in their own tile.

Tilesort is a specialized algorithm for a specific data pattern. If you're unsure whether your data has pre-sorted tiles, use a standard sorting algorithm.

Performance

For a dataset of n elements partitioned into k tiles:

Time Complexity: O(n + k²) where k << n
Space Complexity: O(n) for the output buffer
Best Case: O(n) when data is already sorted (k = 1)
Typical Case: Significantly faster than O(n log n) when k is small

The performance is primarily determined by k (number of tiles) rather than n (total elements), making it highly efficient when the number of tiles is much smaller than the total number of elements.

Installation

Python

pip install tilesort

Requirements: Python 3.8-3.14

Rust

Add to your Cargo.toml:

[dependencies]
tilesort = "0.1.0"

Usage

Python

The Python API mirrors Python's built-in list.sort() and sorted() functions:

import tilesort

# Sort a list in place (like list.sort())
data = [3, 4, 5, 1, 2, 6, 7, 8]
tilesort.sort(data)
print(data)  # [1, 2, 3, 4, 5, 6, 7, 8]

# Return a sorted copy (like sorted())
data = [3, 4, 5, 1, 2, 6, 7, 8]
sorted_data = tilesort.sorted(data)
print(sorted_data)  # [1, 2, 3, 4, 5, 6, 7, 8]
print(data)         # [3, 4, 5, 1, 2, 6, 7, 8] (unchanged)

# Sort with a key function
words = ["elephant", "cat", "dog", "a", "bear"]
tilesort.sort(words, key=len)
print(words)  # ["a", "cat", "dog", "bear", "elephant"]

# Sort in reverse order
numbers = [3, 1, 4, 1, 5, 9, 2, 6]
tilesort.sort(numbers, reverse=True)
print(numbers)  # [9, 6, 5, 4, 3, 2, 1, 1]

# Combine key and reverse
data = [-5, -3, -1, 2, 4]
tilesort.sort(data, key=abs, reverse=True)
print(data)  # [-5, 4, -3, 2, -1]

# Sort custom objects
class Person:
    def __init__(self, name, age):
        self.name = name
        self.age = age

people = [Person("Alice", 30), Person("Bob", 25), Person("Charlie", 35)]
tilesort.sort(people, key=lambda p: p.age)
# Now sorted by age: Bob (25), Alice (30), Charlie (35)

Rust

use tilesort::{tilesort, tilesorted, tilesort_by_key, tilesort_reverse};

fn main() {
    // Sort in place
    let mut data = vec![3, 4, 5, 1, 2, 6, 7, 8];
    tilesort(&mut data);
    println!("{:?}", data);  // [1, 2, 3, 4, 5, 6, 7, 8]

    // Return a sorted copy
    let data = vec![3, 4, 5, 1, 2, 6, 7, 8];
    let sorted = tilesorted(&data);
    println!("{:?}", sorted);  // [1, 2, 3, 4, 5, 6, 7, 8]
    println!("{:?}", data);    // [3, 4, 5, 1, 2, 6, 7, 8] (unchanged)

    // Sort in reverse
    let mut data = vec![3, 1, 4, 1, 5, 9, 2, 6];
    tilesort_reverse(&mut data);
    println!("{:?}", data);  // [9, 6, 5, 4, 3, 2, 1, 1]

    // Sort with a key function
    let mut data = vec![-5i32, -3, -1, 2, 4];
    tilesort_by_key(&mut data, |&x| x.abs());
    println!("{:?}", data);  // [-1, 2, -3, 4, -5]

    // Sort strings by length
    let mut words = vec!["elephant", "cat", "dog", "a", "bear"];
    tilesort_by_key(&mut words, |s| s.len());
    println!("{:?}", words);  // ["a", "cat", "dog", "bear", "elephant"]

    // Sort custom structs
    #[derive(Clone)]
    struct Person {
        name: String,
        age: u32,
    }

    let mut people = vec![
        Person { name: "Alice".to_string(), age: 30 },
        Person { name: "Bob".to_string(), age: 25 },
        Person { name: "Charlie".to_string(), age: 35 },
    ];

    tilesort_by_key(&mut people, |p| p.age);
    // Now sorted by age: Bob (25), Alice (30), Charlie (35)
}

How It Works

Tilesort operates in two phases:

Scan Phase: Identifies contiguous sorted blocks (tiles) in the input data
Restructure Phase: Rearranges tiles in sorted order to produce the final output

The algorithm automatically detects tile boundaries by scanning for order violations. When elements are out of order, a new tile begins. The tiles are then sorted based on their key ranges and concatenated to produce the final sorted sequence.

Example

Given input: [3, 4, 5, 1, 2, 6, 7, 8]

Scan identifies three tiles:
- Tile 0: [3, 4, 5] (range 3-5)
- Tile 1: [1, 2] (range 1-2)
- Tile 2: [6, 7, 8] (range 6-8)
Tiles are sorted by their ranges: Tile 1, Tile 0, Tile 2
Output: [1, 2, 3, 4, 5, 6, 7, 8]

API Reference

Python

tilesort.sort(list, *, key=None, reverse=False) - Sort a list in place
tilesort.sorted(list, *, key=None, reverse=False) - Return a sorted copy

Both functions support:

key: Optional function to extract comparison key from each element
reverse: If True, sort in descending order

Rust

In-place sorting:

tilesort(data: &mut [T]) - Sort in ascending order
tilesort_reverse(data: &mut [T]) - Sort in descending order
tilesort_by_key(data: &mut [T], key_fn: F) - Sort by custom key
tilesort_by_key_reverse(data: &mut [T], key_fn: F) - Sort by custom key, descending

Copying variants:

tilesorted(data: &[T]) -> Vec<T> - Return sorted copy
tilesorted_reverse(data: &[T]) -> Vec<T> - Return sorted copy, descending
tilesorted_by_key(data: &[T], key_fn: F) -> Vec<T> - Return sorted copy by key
tilesorted_by_key_reverse(data: &[T], key_fn: F) -> Vec<T> - Return sorted copy by key, descending

All functions work with any type T that implements Ord + Clone. Key functions must return a type K that implements Ord.

Development

Building from Source

Rust Library

# Run tests
cargo test

# Build the library
cargo build --release

# Generate documentation
cargo doc --open

Python Package

Requirements:

Rust toolchain (1.71.1+)
Python 3.8-3.14
uv (recommended) or maturin

# Install development dependencies
uv sync --group dev

# Build and install in development mode
maturin develop --features python

# Run Python tests
just test-python
# or: uv run --group dev pytest python/tests/

# Run type checking
just typecheck
# or: uv run --group dev mypy python/

# Run all tests (Rust + Python)
just test

# Run linter
just lint

# Format code
just format

Development Commands (Just)

This project uses Just as a command runner:

just              # List all available commands
just test         # Run all tests (Rust + Python)
just test-rust    # Run Rust tests only
just test-python  # Run Python tests only
just typecheck    # Run mypy type checking
just lint         # Run ruff linter
just format       # Format code with ruff
just build        # Build Python package
just bench        # Run benchmarks
just check        # Run all checks (test + typecheck + lint)
just clean        # Clean build artifacts

Benchmarks

Performance benchmarks compare tilesort against Rust's standard sort across different scenarios:

# Run all benchmarks
cargo bench

# Run specific benchmark group
cargo bench uniform_tiles
cargo bench varied_tiles
cargo bench hybrid_tiles
cargo bench random_data
cargo bench key_function
cargo bench realistic_workload

Benchmark scenarios:

uniform_tiles: All tiles have the same size (~1K elements)
varied_tiles: Tiles of different sizes (100, 1K, 5K, 10K)
hybrid_tiles: Mix of single elements and large blocks
random_data: Completely random (worst case for tilesort)
key_function: Structured data requiring key extraction
realistic_workload: 1M elements with ~10K element tiles (mirrors real-world usage)

Results are saved to target/criterion/ with HTML reports.

License

Licensed under either of:

Apache License, Version 2.0 (LICENSE or http://www.apache.org/licenses/LICENSE-2.0)
MIT license (LICENSE or http://opensource.org/licenses/MIT)

at your option.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Areas for Contribution

Performance benchmarks and optimizations
Additional language bindings
Documentation improvements
Bug reports and fixes

Changelog

See CHANGELOG.md for a list of changes in each release.

Commit count: 0

tilesort

documentation

README

Tilesort

Overview

When to Use Tilesort

When NOT to Use Tilesort

Performance

Installation

Python

Rust

Usage

Python

Rust

How It Works

Example

API Reference

Python

Rust

Development

Building from Source

Rust Library

Python Package

Development Commands (Just)

Benchmarks

License

Contributing

Areas for Contribution

Changelog

cargo fmt