| Crates.io | scribe-scaling |
| lib.rs | scribe-scaling |
| version | 0.5.1 |
| created_at | 2025-09-13 19:29:18.426355+00 |
| updated_at | 2025-12-01 20:16:27.527825+00 |
| description | High-performance scaling optimizations for large repositories |
| homepage | https://github.com/sibyllinesoft/scribe |
| repository | https://github.com/sibyllinesoft/scribe |
| max_upload_size | |
| id | 1838057 |
| size | 311,319 |
High-performance scaling optimizations for large repository analysis in Scribe.
scribe-scaling is the performance and optimization layer that enables Scribe to handle repositories of any size—from small projects to enterprise codebases with 100k+ files. It implements streaming architecture, intelligent caching, parallel processing, and context positioning to maintain sub-second performance on small repos and sub-30-second analysis on massive repositories.
Repository → Streaming Scan → Parallel Analysis → Selection + Positioning → Caching → Output
↓ ↓ ↓ ↓ ↓ ↓
Metadata Progressive Multi-threaded 3-Tier Context Blake3 Optimized
First Loading AST/Scoring HEAD/MIDDLE/TAIL Signature Bundle
↓ ↓ ↓
Backpressure Work Stealing LRU + Disk
Control Load Balance Cache
ScalingSelectorMain entry point for scaled repository processing:
StreamingScannerProgressive file system traversal:
ParallelAnalyzerMulti-threaded file analysis:
ContextPositionerOptimizes file order for LLM recall:
CacheManagerPersistent caching system:
AdaptiveConfigAuto-tuning configuration:
use scribe_scaling::{ScalingSelector, ScalingConfig};
let config = ScalingConfig {
performance_target: PerformanceTarget::Balanced,
enable_caching: true,
enable_positioning: true,
..Default::default()
};
let selector = ScalingSelector::new(config);
let result = selector.select_and_process(repo_path).await?;
println!("Analyzed {} files in {:?}",
result.files_processed,
result.elapsed_time
);
println!("Cache hit rate: {:.1}%", result.cache_stats.hit_rate * 100.0);
use scribe_scaling::{ScalingSelector, ContextPositioningConfig};
let mut config = ScalingConfig::default();
config.positioning = ContextPositioningConfig {
enable_positioning: true,
head_percentage: 0.20, // 20% high-priority files at HEAD
tail_percentage: 0.20, // 20% core files at TAIL
centrality_weight: 0.5,
query_relevance_weight: 0.3,
relatedness_weight: 0.2,
};
let selector = ScalingSelector::new(config);
let result = selector.select_and_process_with_query(
repo_path,
Some("authentication middleware") // Query hint
).await?;
if result.has_context_positioning() {
let ordered = result.get_optimally_ordered_files();
println!("HEAD: {} files", ordered.head.len());
println!("MIDDLE: {} files", ordered.middle.len());
println!("TAIL: {} files", ordered.tail.len());
}
use scribe_scaling::{StreamingScanner, ScanConfig};
use indicatif::{ProgressBar, ProgressStyle};
let scanner = StreamingScanner::new(ScanConfig::default());
let progress = ProgressBar::new_spinner();
progress.set_style(
ProgressStyle::default_spinner()
.template("{spinner} [{elapsed}] {msg} ({pos} files)")
);
let mut file_stream = scanner.scan_streaming(repo_path).await?;
while let Some(file) = file_stream.next().await {
progress.set_message(format!("Scanning {}", file.path.display()));
progress.inc(1);
// Process file metadata immediately
if file.score > threshold {
// Load and analyze content only for high-scoring files
let content = file.load_content().await?;
analyze(content).await?;
}
}
progress.finish_with_message("Scan complete");
use scribe_scaling::{PerformanceTarget, ScalingConfig};
// Fast mode: Prioritize speed over completeness
let fast_config = ScalingConfig {
performance_target: PerformanceTarget::Speed,
parallel_threads: Some(num_cpus::get()),
cache_size_mb: 500,
max_file_size: 500_000, // Skip large files
enable_positioning: false,
};
// Quality mode: Prioritize completeness and quality
let quality_config = ScalingConfig {
performance_target: PerformanceTarget::Quality,
parallel_threads: Some(4), // Fewer threads, more thorough
cache_size_mb: 2000,
max_file_size: 5_000_000,
enable_positioning: true,
};
// Balanced mode (default)
let balanced_config = ScalingConfig::default();
use scribe_scaling::cache::{CacheManager, CacheConfig};
let cache_config = CacheConfig {
cache_dir: PathBuf::from(".scribe-cache"),
max_size_mb: 1000,
compression_level: 6,
ttl_hours: 24,
};
let cache = CacheManager::new(cache_config)?;
// Check cache status
let stats = cache.stats();
println!("Cache entries: {}", stats.entry_count);
println!("Total size: {} MB", stats.size_mb);
println!("Hit rate: {:.1}%", stats.hit_rate * 100.0);
// Clear old entries
cache.evict_expired()?;
// Clear entire cache
cache.clear()?;
| Size | Files | Time Target | Memory Target | Strategy |
|---|---|---|---|---|
| Small | ≤1k | <1s | <50MB | In-memory, minimal caching |
| Medium | 1k-10k | <5s | <200MB | Parallel + caching |
| Large | 10k-100k | <15s | <1GB | Streaming + aggressive caching |
| Enterprise | 100k+ | <30s | <2GB | Full optimization suite |
Based on internal benchmarks:
Transformer models don't attend equally to all tokens:
Strategy: Place most important files where LLMs attend best.
1. Compute PageRank centrality for all files
2. Score query relevance (if query provided)
3. Combined score = centrality_weight * centrality +
query_weight * relevance
4. Sort by combined score
5. Top 20% → HEAD
6. Middle 60% → MIDDLE
7. Bottom 20% with highest centrality → TAIL
8. Group related files within each tier
ScalingConfig| Field | Type | Default | Description |
|---|---|---|---|
performance_target |
PerformanceTarget |
Balanced |
Speed, Balanced, or Quality |
parallel_threads |
Option<usize> |
CPU count | Thread pool size |
cache_size_mb |
usize |
1000 |
Maximum cache size |
enable_caching |
bool |
true |
Enable persistent cache |
enable_positioning |
bool |
true |
Enable context positioning |
max_file_size |
usize |
1_000_000 |
Skip files larger than this |
ContextPositioningConfig| Field | Type | Default | Description |
|---|---|---|---|
enable_positioning |
bool |
true |
Enable/disable positioning |
head_percentage |
f64 |
0.20 |
Percentage for HEAD section |
tail_percentage |
f64 |
0.20 |
Percentage for TAIL section |
centrality_weight |
f64 |
0.4 |
Weight for centrality scoring |
query_relevance_weight |
f64 |
0.3 |
Weight for query matching |
relatedness_weight |
f64 |
0.3 |
Weight for file grouping |
scribe-scaling is used by:
docs/context-positioning.md: Detailed context positioning documentationscribe-selection: File selection algorithms that scaling optimizesscribe-graph: PageRank computation used by positioning../../WHY_SCRIBE.md: Philosophy on performance and intelligence