sipha-source

Crates.iosipha-source
lib.rssipha-source
version0.3.0
created_at2025-11-22 10:27:23.477262+00
updated_at2025-11-22 10:27:23.477262+00
descriptionCentralized source file management and byte-offset to line/column conversion for sipha
homepage
repositoryhttps://github.com/NyalephTheCat/sipha
max_upload_size
id1945176
size66,892
Nyaleph (NyalephTheCat)

documentation

README

sipha-source

Centralized source file management and byte-offset to line/column conversion for sipha.

This crate provides utilities for managing source files, converting between byte offsets and line/column positions, and extracting source snippets for diagnostics and error reporting.

Features

  • Efficient Position Conversion: O(log n) conversion between byte offsets and line/column positions
  • UTF-8 Support: Column numbers are character positions for UTF-8 content
  • Non-UTF-8 Support: Handle binary and non-UTF-8 encoded files with byte-based column calculations
  • Source Snippet Extraction: Extract code snippets with context for error messages
  • Multi-file Support: Manage multiple source files in a project
  • Minimal Dependencies: Only depends on sipha-core

Usage

Basic Usage

use sipha_source::SourceFile;
use sipha_core::span::Span;

// Create a source file
let source = SourceFile::new(
    "fn main() {\n    println!(\"Hello\");\n}".to_string(),
    None,
);

// Convert byte offset to line/column
let pos = source.byte_to_line_col(20).unwrap();
assert_eq!(pos.line(), 2);
assert_eq!(pos.column(), 5);

// Extract source text for a span
let span = Span::new(20, 27);
let text = source.extract_span(span).unwrap();
assert_eq!(text, "println");

Loading from File

use sipha_source::SourceFile;
use std::path::Path;

// Load as UTF-8 (fails if not valid UTF-8)
let source = SourceFile::from_path(Path::new("src/main.rs"))?;

// Load as bytes (supports any encoding)
let source = SourceFile::from_path_bytes(Path::new("data.bin"))?;

Non-UTF-8 Content

use sipha_source::SourceFile;

// Create from bytes (supports non-UTF-8)
let binary = vec![0xFF, 0xFE, 0x00, 0x01];
let source = SourceFile::from_bytes(binary, None);

// For non-UTF-8 content, columns are byte-based
let pos = source.byte_to_line_col(2).unwrap();
assert_eq!(pos.column(), 3); // Byte position, not character

// Access content as bytes
let bytes = source.content_bytes();
// content() returns None for non-UTF-8
assert_eq!(source.content(), None);

Multi-file Management

use sipha_source::SourceMap;
use std::path::PathBuf;

let mut map = SourceMap::new();
map.add_file(
    PathBuf::from("src/main.rs"),
    "fn main() {}".to_string(),
);

let file = map.get_file(Path::new("src/main.rs")).unwrap();

Source Snippets

use sipha_source::SourceFile;
use sipha_core::span::Span;

let source = SourceFile::new(
    "line 1\nline 2\nline 3\nline 4\nline 5".to_string(),
    None,
);

let span = Span::new(14, 19); // "line 3"
let snippet = source.extract_snippet(span, 1).unwrap();

// snippet.lines contains the lines with context
// snippet.highlight_span is the span to highlight

Performance

  • Line Map Construction: O(n) where n is the source length
  • Byte to Line/Column: O(log n) using binary search
  • Line/Column to Byte: O(m) where m is the line length (character iteration)
  • Source Extraction: O(1) for simple spans, O(k) for snippets with k context lines

UTF-8 and Non-UTF-8 Handling

UTF-8 Content

For UTF-8 content, column numbers are character positions, not byte positions:

  • Multi-byte UTF-8 characters (like 世界) count as one column
  • Emoji count as one column
  • The column number represents the visual position, not the byte offset

Example:

let source = SourceFile::new("hello 世界".to_string(), None);
let pos = source.byte_to_line_col(6).unwrap();
// pos.column() is 7 (character position), not 6 (byte position)

Non-UTF-8 Content

For non-UTF-8 content (binary files, other encodings), column numbers are byte positions:

  • Each byte counts as one column
  • No character decoding is performed
  • Use from_bytes() or from_path_bytes() for non-UTF-8 content

Example:

let binary = vec![0xFF, 0xFE, 0x00];
let source = SourceFile::from_bytes(binary, None);
let pos = source.byte_to_line_col(1).unwrap();
// pos.column() is 2 (byte position)

Integration with Diagnostics

This crate is designed to work with sipha-error for better diagnostics:

use sipha_source::SourceFile;
use sipha_error::Diagnostic;

let source = SourceFile::new("fn main() {}".to_string(), None);
let diagnostic = Diagnostic::builder()
    .message("Error message")
    .spans(vec![span])
    .build();

// Format with source file (future integration)
// let formatted = diagnostic.format_with_source_file(&source);

License

MIT

Commit count: 0

cargo fmt