| Crates.io | gem-index-filter |
| lib.rs | gem-index-filter |
| version | 0.2.0 |
| created_at | 2025-10-24 03:25:28.828097+00 |
| updated_at | 2025-10-25 23:52:15.365058+00 |
| description | Fast streaming filter for RubyGems versions index files |
| homepage | https://github.com/gem-coop/gem-index-filter |
| repository | https://github.com/gem-coop/gem-index-filter |
| max_upload_size | |
| id | 1897949 |
| size | 67,454 |
Fast filtering for RubyGems versions index files. Designed for memory-constrained environments like Fastly Compute edge workers.
--allow and --block together (allowlist - blocklist)0 to reduce sizegem-index-filter [OPTIONS] <versions-file> [output-file]
Options:
--allow <file> Filter to only gems in allowlist file (one name per line)
--block <file> Filter out gems in blocklist file (one name per line)
--strip-versions Replace version lists with '0' in output
Examples:
# Pass through all gems (no filtering)
gem-index-filter versions
# Filter to only gems in allowlist
gem-index-filter --allow allowlist.txt versions filtered.txt
# Block specific gems
gem-index-filter --block blocklist.txt versions filtered.txt
# Allow mode with blocked gems removed (allowlist - blocklist)
gem-index-filter --allow allow.txt --block block.txt versions filtered.txt
# Strip version information (replace with '0')
gem-index-filter --strip-versions versions filtered.txt
# Stream from stdin
curl https://rubygems.org/versions | gem-index-filter --allow allowlist.txt - > filtered.txt
Filter file format (one gem name per line, # for comments):
rails
sinatra
activerecord
puma
use gem_index_filter::{filter_versions_streaming, FilterMode};
use std::collections::HashSet;
use std::fs::File;
let input = File::open("versions")?;
let mut output = File::create("versions.filtered")?;
// Create allowlist
let mut allowlist = HashSet::new();
allowlist.insert("rails");
allowlist.insert("sinatra");
// Stream and filter
filter_versions_streaming(input, &mut output, FilterMode::Allow(&allowlist), false)?;
Other modes:
// Block mode - exclude specific gems
let mut blocklist = HashSet::new();
blocklist.insert("big-gem");
filter_versions_streaming(input, &mut output, FilterMode::Block(&blocklist), false)?;
// Passthrough mode - no filtering
filter_versions_streaming(input, &mut output, FilterMode::Passthrough, false)?;
// Strip versions while filtering
filter_versions_streaming(input, &mut output, FilterMode::Allow(&allowlist), true)?;
The format uses one line per rubygem, with additional lines appended for updates:
created_at: 2024-04-01T00:00:05Z
---
gemname [-]version[,version]* MD5
When a gem appears multiple times, the last occurrence has the authoritative MD5.
gemlist.contains(gemname) == truegemlist.contains(gemname) == falseallowlist - blocklist at startup, then use Allow modeThe filtering is optimized for performance and simplicity:
The versions file supports HTTP range requests, enabling incremental updates:
// Future API design
struct FilteredIndex {
data: Vec<u8>,
last_byte_offset: u64, // Track where we've processed to
}
impl FilteredIndex {
fn update(&mut self, range_data: &[u8]) {
// Process only new appended data
// Merge updates into existing filtered index
}
}
Strategy:
Range: bytes={offset}- for incremental updates# Run tests
cargo test
# Build release binary
cargo build --release
# For Fastly Compute (wasm32-wasi target)
cargo build --target wasm32-wasi --release
# Run all tests
cargo test
# Test with real data (if you have a versions file)
cargo run --release -- versions output.txt
MIT