idpath

Crates.ioidpath
lib.rsidpath
version0.1.1
created_at2025-12-26 03:44:23.021338+00
updated_at2025-12-26 03:44:23.021338+00
descriptionHierarchical path encoding for scalable storage systems / 为可扩展存储系统设计的层次化路径编码
homepagehttps://github.com/js0/rust/tree/main/idpath
repositoryhttps://github.com/js0/rust.git
max_upload_size
id2005235
size52,289
i18n.site (i18nsite)

documentation

README

English | 中文


idpath : Efficient Hierarchical Path Encoding for Scalable Storage

Introduction

idpath provides utilities to encode 64-bit integers into hierarchical filesystem paths. By placing low-order bits at higher directory levels, it ensures uniform distribution of files across directories, preventing filesystem performance degradation caused by directory saturation.

Features

  • Load Balancing: Low-bits-first strategy ensures even distribution even with sequential IDs.
  • Zero-Allocation Decoding: Utilizes HipStr and stack buffers to eliminate heap allocations during successful decoding.
  • Dynamic Depth: Adjusts path depth (1 to 3 levels) based on ID magnitude with unique suffixes (_, -, ~).
  • Crockford Base32: Uses a URL-safe, human-readable character set.
  • Robust Error Handling: Provides detailed context-aware error messages with paths.

Usage

use idpath::{encode, decode};

fn main() -> idpath::Result<()> {
  let id = 1234567u64;
  let prefix = "/storage";

  // Encode
  let path = encode(prefix, id);
  println!("Path: {}", path);

  // Decode
  let decoded_id = decode(&path)?;
  assert_eq!(id, decoded_id);

  Ok(())
}

Design Rationale

Modern filesystems (like ext4 or XFS) experience performance drops when a single directory contains thousands of entries. Hierarchical structures mitigate this. By reversing bit significance during path construction (Low-Bits-First), idpath avoids hotspots even when IDs are generated sequentially (e.g., from databases).

graph TD
    ID[u64 ID] --> B32[Crockford Base32]
    B32 --> Reorder{Bits Reordering}
    Reorder -- "len 0-2" --> P1[Level 1: prefix/xx_]
    Reorder -- "len 3" --> P2[Level 2: prefix/xx/x-]
    Reorder -- "len 4" --> P3[Level 2: prefix/xx/xx~]
    Reorder -- "len >=5" --> P4[Level 3: prefix/xx/xx/xxx]
    P1/P2/P3/P4 --> Output[HipStr Path]

Technology Stack

  • Rust: Language providing safety and performance.
  • fast32: High-performance Crockford Base32 implementation.
  • hipstr: Shared string optimization reducing memory footprints.

Directory Structure

.
├── src/
│   ├── lib.rs   # Core encoding/decoding logic
│   └── error.rs # Custom error definitions
└── tests/
    └── main.rs  # Integration tests

API Reference

Functions

  • encode(prefix: impl AsRef<str>, id: u64) -> HipStr<'static> Encodes an ID into a path string.
  • decode(path: impl AsRef<str>) -> Result<u64> Decodes a path string back into an ID.

Constants

  • DEPTH1: b'_', suffix for 1-level paths.
  • DEPTH2: b'-', suffix for 2-level paths (3-char encoded).
  • DEPTH3: b'~', suffix for 2-level paths (4-char encoded).

Types/Errors

  • Result<T>: Alias for std::result::Result<T, Error>.
  • Error: Enum containing InvalidPath(HipStr) and DecodeFailed(HipStr).

History

Hierarchical directory structures for data distribution date back to early Unix systems. Large-scale caching proxies like Squid and version control systems like Git popularized this "fan-out" approach. Git, for instance, stores objects in a .git/objects/xx/xxxxxxxx... structure using the first two hex digits of a SHA-1 hash. idpath extends this concept by prioritizing low-order bits, specifically optimized for numeric IDs where sequential growth is common.


About

This project is an open-source component of js0.site ⋅ Refactoring the Internet Plan.

We are redefining the development paradigm of the Internet in a componentized way. Welcome to follow us:


idpath : 为可扩展存储系统设计的层次化路径编码

项目介绍

idpath 工具用于将 64 位整数编码为层次化文件系统路径。通过将低位比特放置在目录的高层级,确保文件均匀分布在不同目录中,防止目录由于条目过多导致文件系统性能下降。

功能特性

  • 负载均衡:低位优先策略保证即使 ID 连续增长,也能实现均匀分布。
  • 零分配解码:结合 HipStr 与栈分配缓冲区,成功解码过程中无需堆分配。
  • 动态深度:根据 ID 数值大小自动调节路径深度(1 至 3 层),并辅以唯一后缀(_, -, ~)。
  • Crockford Base32:采用 URL 安全且易读的字符集。
  • 健壮错误处理:提供包含路径上下文的详细错误信息。

使用演示

use idpath::{encode, decode};

fn main() -> idpath::Result<()> {
  let id = 1234567u64;
  let prefix = "/data";

  // 编码
  let path = encode(prefix, id);
  println!("编码路径: {}", path);

  // 解码
  let decoded_id = decode(&path)?;
  assert_eq!(id, decoded_id);

  Ok(())
}

设计核心

现代文件系统(如 ext4 或 XFS)在单个目录包含数千个条目时性能会大幅下降。层次化结构能有效缓解此问题。idpath 在构建路径时反转比特重要性(低位优先),有效规避了 ID 顺序生成(例如数据库自增主键)产生的存储热点。

graph TD
    ID[u64 ID] --> B32[Crockford Base32]
    B32 --> Reorder{比特重排}
    Reorder -- "长度 0-2" --> P1[1层深度: prefix/xx_]
    Reorder -- "长度 3" --> P2[2层深度: prefix/xx/x-]
    Reorder -- "长度 4" --> P3[2层深度: prefix/xx/xx~]
    Reorder -- "长度 >=5" --> P4[3层深度: prefix/xx/xx/xxx]
    P1/P2/P3/P4 --> Output[HipStr 路径]

技术堆栈

  • Rust:兼顾安全与高性能的编程语言。
  • fast32:高性能 Crockford Base32 实现。
  • hipstr:共享字符串优化,降低内存占用。

目录结构

.
├── src/
│   ├── lib.rs   # 核心编解码逻辑
│   └── error.rs # 自定义错误定义
└── tests/
    └── main.rs  # 集成测试

API 介绍

函数

  • encode(prefix: impl AsRef<str>, id: u64) -> HipStr<'static> 将 ID 编码为路径字符串。
  • decode(path: impl AsRef<str>) -> Result<u64> 将路径字符串解码回 ID。

常量

  • DEPTH1: b'_',1 层深度的后缀。
  • DEPTH2: b'-',2 层深度的后缀(编码长度为 3)。
  • DEPTH3: b'~',2 层深度的后缀(编码长度为 4)。

类型与错误

  • Result<T>std::result::Result<T, Error> 的别名。
  • Error:包含 InvalidPath(HipStr)DecodeFailed(HipStr) 的枚举。

历史趣闻

层次化目录结构用于数据分发的设计最早可追溯到早期 Unix 系统。Squid 缓存代理和 Git 版本控制系统等大型工具使这种“扇出”(fan-out)方法广为人知。以 Git 为例,它利用 SHA-1 哈希值的前两位作为目录名(如 .git/objects/xx/xxxxxxxx...)来存储对象。idpath 扩展了这一理念,通过优先处理低位比特,专门针对数字 ID 顺序增长的场景进行了优化,实现了更优的磁盘分布规律。


关于

本项目为 js0.site ⋅ 重构互联网计划 的开源组件。

我们正在以组件化的方式重新定义互联网的开发范式,欢迎关注:

Commit count: 0

cargo fmt