| Crates.io | scribe-scanner |
| lib.rs | scribe-scanner |
| version | 0.5.1 |
| created_at | 2025-09-13 14:51:53.698193+00 |
| updated_at | 2025-12-01 20:16:02.719694+00 |
| description | High-performance file system scanning and indexing for Scribe |
| homepage | https://github.com/sibyllinesoft/scribe |
| repository | https://github.com/sibyllinesoft/scribe |
| max_upload_size | |
| id | 1837753 |
| size | 454,738 |
High-performance file system scanning and indexing for Scribe repository analysis.
scribe-scanner is the foundational crate responsible for efficiently traversing repositories, filtering files, detecting languages, and building the initial file metadata that feeds into Scribe's analysis pipeline. It handles repositories of any size—from small projects to enterprise codebases with 100k+ files.
rayon for multi-core file system traversal.gitignore, .scribeignore, and custom patternsRepository → Scanner → FileMetadata → Analysis Pipeline
↓ ↓ ↓
.gitignore Filter Language
Patterns Engine Detection
↓ ↓
Ignore AST Parser
Rules (tree-sitter)
RepositoryScannerMain entry point for repository traversal. Configurable scanning options:
FileMetadataRich metadata structure containing:
IgnoreEngineHandles pattern matching for file exclusion:
.gitignore parsing using ignore crate.scribeignore patternsLanguageDetectorDetermines file language and characteristics:
use scribe_scanner::{RepositoryScanner, ScanConfig};
let config = ScanConfig {
root_path: PathBuf::from("."),
max_file_size: 1_000_000, // 1MB
exclude_tests: true,
..Default::default()
};
let scanner = RepositoryScanner::new(config);
let files = scanner.scan().await?;
println!("Scanned {} files", files.len());
for file in files {
println!("{}: {} ({})", file.path.display(), file.language, file.size);
}
use scribe_scanner::{ScanConfig, PatternSet};
let mut config = ScanConfig::default();
config.exclude_patterns = PatternSet::new(vec![
"**/*.log",
"**/node_modules/**",
"**/.venv/**",
]);
config.include_patterns = PatternSet::new(vec![
"src/**/*.rs",
"lib/**/*.py",
]);
let scanner = RepositoryScanner::new(config);
use scribe_scanner::git::ChurnAnalyzer;
let analyzer = ChurnAnalyzer::new(".")?;
let churn_data = analyzer.analyze_file("src/main.rs")?;
println!("Changes: {}", churn_data.commit_count);
println!("Last modified: {}", churn_data.last_change);
println!("Recent activity score: {:.2}", churn_data.recency_score);
ScanConfig Options| Field | Type | Default | Description |
|---|---|---|---|
root_path |
PathBuf |
"." |
Repository root directory |
max_file_size |
usize |
1MB | Skip files larger than this |
exclude_tests |
bool |
false |
Exclude test files from scan |
follow_symlinks |
bool |
false |
Follow symbolic links |
include_patterns |
PatternSet |
Empty | Glob patterns for inclusion |
exclude_patterns |
PatternSet |
Empty | Glob patterns for exclusion |
max_depth |
Option<usize> |
None | Maximum directory depth |
parallel_threads |
usize |
CPU count | Scanner thread pool size |
scribe-scanner is designed as a foundational crate used by higher-level components:
FileMetadata for AST parsing and import extractionscribe-patterns: Advanced pattern matching and glob supportscribe-analysis: AST parsing and semantic analysisscribe-core: Shared types and configuration../../ARCHITECTURE.md: Overall system design