| Crates.io | scribe-selection |
| lib.rs | scribe-selection |
| version | 0.5.1 |
| created_at | 2025-09-13 06:24:06.769884+00 |
| updated_at | 2025-12-01 20:15:48.202673+00 |
| description | Intelligent code selection and context extraction for Scribe |
| homepage | https://github.com/sibyllinesoft/scribe |
| repository | https://github.com/sibyllinesoft/scribe |
| max_upload_size | |
| id | 1837305 |
| size | 249,039 |
Intelligent file selection and context extraction for Scribe repository bundles.
scribe-selection implements sophisticated algorithms for choosing which files to include in a repository bundle. Rather than naively including everything or requiring manual selection, it uses multi-dimensional scoring, graph centrality, and heuristic analysis to automatically identify the most important code for LLM understanding.
main.rs, __init__.py, index.jsTransparent rule-based decision tree for file selection:
Surgical selection targeting specific entities:
When approaching token budgets, intelligently reduce content:
Achieves 3-10x compression while preserving critical context.
Two preset configurations:
Custom weight tuning for specific use cases.
FileMetadata + Scores → Selection Algorithm → Budget Enforcement → Demotion Engine → Final Selection
↓ ↓ ↓ ↓ ↓ ↓
Heuristics PageRank Simple Router Token Check AST Chunking Selected Files
Scoring Analysis Decision Tree Hard Limits Signature Extract with Metadata
FileScorerComputes multi-dimensional importance scores:
score = w_doc*doc + w_readme*readme + w_imp*imp_deg + w_path*path_depth^-1 +
w_test*test_link + w_churn*churn + w_centrality*centrality +
w_entrypoint*entrypoint + w_examples*examples + priority_boost
Each dimension is normalized to [0, 1] and combined with configurable weights.
SimpleRouterRule-based selection algorithm:
CoveringSetSelectorTargets specific code entities:
DemotionEngineProgressive content reduction:
use scribe_selection::{Selector, SelectionConfig};
let config = SelectionConfig {
algorithm: Algorithm::SimpleRouter,
token_budget: 100_000,
max_files: Some(200),
exclude_tests: true,
..Default::default()
};
let selector = Selector::new(config);
let result = selector.select(files).await?;
println!("Selected {} files using {} tokens",
result.selected.len(),
result.total_tokens
);
use scribe_selection::{ScoringWeights, SelectionConfig};
let weights = ScoringWeights {
documentation: 0.3,
centrality: 0.4, // Emphasize graph importance
test_linkage: 0.1,
churn: 0.1,
path_depth: 0.05,
entrypoint: 0.05,
};
let config = SelectionConfig {
scoring_weights: weights,
..Default::default()
};
let selector = Selector::new(config);
use scribe_selection::{CoveringSetConfig, EntityType};
let config = CoveringSetConfig {
entity_name: "authenticate_user".to_string(),
entity_type: EntityType::Function,
max_files: 20,
max_depth: Some(3),
include_dependents: false, // For understanding mode
importance_threshold: 0.01,
};
let result = selector.select_covering_set(files, config).await?;
for (file, reason) in result.selected {
println!("{}: {:?}", file.path.display(), reason);
}
// Output:
// src/auth.rs: Target (contains function)
// src/db.rs: DirectDependency (imported by auth.rs)
// src/config.rs: TransitiveDependency (imported by db.rs, depth 2)
use scribe_selection::{DemotionLevel, DemotionConfig};
let mut config = SelectionConfig::default();
config.demotion_enabled = true;
config.demotion_threshold = 0.9; // Start demoting at 90% of budget
let selector = Selector::new(config);
let result = selector.select_with_budget(files, 50_000).await?;
// Check demotion results
for file in &result.selected {
match file.demotion_level {
DemotionLevel::Full => println!("{}: full content", file.path.display()),
DemotionLevel::Chunk => println!("{}: chunked to key sections", file.path.display()),
DemotionLevel::Signature => println!("{}: signatures only", file.path.display()),
}
}
println!("Compression ratio: {:.2}x", result.compression_ratio);
println!("Quality score: {:.2}%", result.quality_score * 100.0);
use scribe_selection::{CoveringSetConfig, EntityType};
let config = CoveringSetConfig {
entity_name: "User".to_string(),
entity_type: EntityType::Class,
include_dependents: true, // Find what depends on this class
max_depth: Some(2),
..Default::default()
};
let result = selector.select_covering_set(files, config).await?;
println!("Changing User class affects {} files:", result.selected.len());
for (file, reason) in result.selected {
if matches!(reason, InclusionReason::Dependent(_)) {
println!(" {} will be impacted", file.path.display());
}
}
DESIGN.md, ARCHITECTURE.md): 0.9churn = frequency * recency1 / (depth + 1)SelectionConfig| Field | Type | Default | Description |
|---|---|---|---|
algorithm |
Algorithm |
SimpleRouter |
Selection algorithm to use |
token_budget |
usize |
100_000 |
Maximum tokens in bundle |
max_files |
Option<usize> |
None |
Maximum number of files |
exclude_tests |
bool |
false |
Exclude test files |
scoring_weights |
ScoringWeights |
V2 |
Weight configuration |
demotion_enabled |
bool |
true |
Enable progressive demotion |
demotion_threshold |
f64 |
0.85 |
Start demotion at % of budget |
ScoringWeights| Field | Type | V1 | V2 | Description |
|---|---|---|---|---|
documentation |
f64 |
0.2 | 0.25 | Documentation scoring weight |
centrality |
f64 |
0.2 | 0.30 | PageRank centrality weight |
test_linkage |
f64 |
0.15 | 0.10 | Test-source relationship weight |
churn |
f64 |
0.15 | 0.10 | Git activity weight |
path_depth |
f64 |
0.15 | 0.10 | Directory depth weight |
entrypoint |
f64 |
0.10 | 0.10 | Entry point detection weight |
examples |
f64 |
0.05 | 0.05 | Example code weight |
scribe-selection is the core decision-making component used by:
--algorithm, --token-budget, --max-files flagsscribe-graph: Provides PageRank centrality scoresscribe-scaling: Token budgeting and performance optimizationscribe-analysis: AST parsing for demotion and chunking../../WHY_SCRIBE.md: Context on intelligent selection philosophy