| Crates.io | compression-prompt |
| lib.rs | compression-prompt |
| version | 0.1.2 |
| created_at | 2025-10-22 10:08:35.137674+00 |
| updated_at | 2025-11-06 16:45:12.4176+00 |
| description | Fast statistical compression for LLM prompts - 50% token reduction with 91% quality retention |
| homepage | https://github.com/hivellm/compression-prompt |
| repository | https://github.com/hivellm/compression-prompt |
| max_upload_size | |
| id | 1895400 |
| size | 572,570 |
Fast, intelligent prompt compression for LLMs - Save 50% tokens while maintaining 91% quality
A Rust implementation of statistical filtering for prompt compression. Achieves 50% token reduction with 91% quality retention (Claude Sonnet) in <1ms, validated across 6 flagship LLMs with 350+ test pairs.
Validated on 200 real arXiv papers (1.6M tokens):
β
COMPRESSION SUCCESSFUL!
β Original: 1,662,729 tokens
β Compressed: 831,364 tokens
β Savings: 831,365 tokens (50.0%)
β Time: 0.92s (10.58 MB/s)
β Quality Score: 88.6%
β Keyword Retention: 100.0%
β Entity Retention: 91.8%
Statistical filtering uses intelligent token scoring to remove low-value words while preserving meaning:
What gets removed: "the" (75K), "and" (36K), "of" (35K), "a" (28K)
What stays: Keywords, entities, technical terms, numbers (100% retention)
Critical Fix: JSON structures, code blocks, and structured data are now 100% preserved during compression.
Available in all implementations:
// Before fix: JSON could be partially removed β
{"user": {"name": "Alice", "age": 30}} β {"user": {"name": "Alice" 30}}
// After fix: JSON completely preserved β
{"user": {"name": "Alice", "age": 30}} β {"user": {"name": "Alice", "age": 30}}
Protected Content:
```code```)/path/to/file.ext)camelCase, snake_case, UPPER_CASE)Use Cases:
See JSON Preservation Fix Documentation for details.
cd rust && cargo build --release
use compression_prompt::statistical_filter::{StatisticalFilter, StatisticalFilterConfig};
use compression_prompt::tokenizer::{MockTokenizer, Tokenizer};
// Use the recommended default (50% compression, 89% quality)
let config = StatisticalFilterConfig::default();
let filter = StatisticalFilter::new(config);
let tokenizer = MockTokenizer;
let compressed = filter.compress(&text, &tokenizer);
// Calculate savings
let savings = 1.0 - (tokenizer.count_tokens(&compressed) as f32 /
tokenizer.count_tokens(&text) as f32);
println!("Savings: {:.1}%", savings * 100.0);
// Balanced (default) - 50% compression, 89% quality β
let balanced = StatisticalFilterConfig::default();
// Conservative - 30% compression, 96% quality
let conservative = StatisticalFilterConfig {
compression_ratio: 0.7,
..Default::default()
};
// Aggressive - 70% compression, 71% quality
let aggressive = StatisticalFilterConfig {
compression_ratio: 0.3,
..Default::default()
};
NEW: Inspired by DeepSeek-OCR's optical context compression, compress text into 1024x1024 images for vision model consumption.
use compression_prompt::{StatisticalFilter, ImageRenderer};
// Compress text with statistical filtering
let filter = StatisticalFilter::default();
let compressed = filter.compress(&text);
// Render to PNG image
let renderer = ImageRenderer::default();
let png_data = renderer.render_to_png(&compressed)?;
std::fs::write("compressed.png", png_data)?;
// Or render to JPEG (66% smaller than PNG)
let jpeg_data = renderer.render_to_jpeg(&compressed, 85)?; // quality: 85
std::fs::write("compressed.jpg", jpeg_data)?;
Benefits:
Image Formats:
Example:
# Generate PNG images (50% compression)
cargo run --release --example paper_to_png_50pct
# Output: rnn_paper_compressed_page1.png, page2.png, page3.png...
# Compare PNG vs JPEG formats
cargo run --release --example compare_image_formats
# Tests different JPEG quality levels
Use Cases:
Status: Beta - Works well, pending extensive validation with vision models
Original (1.6M tokens):
Bayesian Active Learning for Classification... Information theoretic
active learning has been widely studied for probabilistic models...
[1.6 million more tokens...]
Compressed (831K tokens - 50% reduction in 0.92s):
Bayesian Active Classification... Information been widely studied
probabilistic models...
[compressed to 831K tokens...]
Removed: 831,365 tokens (mainly "the", "and", "of", "a", "to")
Preserved: 100% of keywords, 92% of entities
Tested across 6 flagship LLMs with 350+ A/B test pairs:
| LLM | Quality | Token Savings | Use Case |
|---|---|---|---|
| Grok-4 | 93% | 50% | Best overall performance |
| Claude 3.5 Sonnet | 91% | 50% | Best cost-benefit β |
| Gemini Pro | 89% | 50% | Balanced production |
| GPT-5 | 89% | 50% | Keyword retention |
| Grok | 88% | 50% | Technical content |
| Claude Haiku | 87% | 50% | Cost-optimized |
| LLM | Quality | Token Savings | Use Case |
|---|---|---|---|
| Grok-4 | 98% | 30% | Critical tasks |
| Claude 3.5 Sonnet | 97% | 30% | High precision |
| GPT-5 | 96% | 30% | Legal/Medical |
| Gemini Pro | 96% | 30% | Near-perfect |
| Grok | 95% | 30% | Complex reasoning |
| Claude Haiku | 94% | 30% | Recommended for Haiku |
| Compression | Token Savings | Speed | Keyword Retention | Entity Retention |
|---|---|---|---|---|
| 50% (statistical_50) β | 50% | 0.16ms | 92.0% | 89.5% |
| 70% (statistical_70) | 30% | 0.15ms | 99.2% | 98.4% |
| 30% (statistical_30) | 70% | 0.17ms | 72.4% | 71.5% |
For 1 million tokens with statistical_50:
| LLM | Cost Before | Cost After | Savings | Quality Retained |
|---|---|---|---|---|
| Grok-4 | $5.00 | $2.50 | $2.50 (50%) | 93% |
| Claude Sonnet | $15.00 | $7.50 | $7.50 (50%) | 91% β |
| GPT-5 | $5.00 | $2.50 | $2.50 (50%) | 89% |
| Gemini Pro | $3.50 | $1.75 | $1.75 (50%) | 89% |
Annual savings for high-volume applications (Claude Sonnet):
ROI: 91% quality with 50% cost reduction = Excellent cost-benefit
# Test on full dataset (200 papers, 1.6M tokens)
cargo run --release --bin test_statistical
# Quality benchmark (20 papers with detailed metrics)
cargo run --release --bin bench_quality
# Generate LLM evaluation dataset (63 prompt pairs)
cargo run --release --bin generate_llm_dataset
350+ test pairs validated across 6 LLMs:
# View aggregated results
cat benchmarks/ab_tests/ab_test_comparison.md
# View LLM-specific reports
cat benchmarks/CLAUDE-SONNET-TEST-AB.md
cat benchmarks/GROK-4-TEST-AB.md
cat benchmarks/GPT5-TEST-AB.md
cat benchmarks/GEMINI-TEST-AB.md
# Access individual test files
ls benchmarks/llm_tests/100papers_statistical_50/ # 150 files
ls benchmarks/llm_tests/200papers_statistical_50/ # 300 files
Test Coverage:
All test pairs are available for independent validation:
# View a specific test pair
cat benchmarks/llm_tests/100papers_statistical_50/test_001_original.txt
cat benchmarks/llm_tests/100papers_statistical_50/test_001_compressed.txt
# Test with your LLM
python3 scripts/test_with_llm.py \
--original benchmarks/llm_tests/100papers_statistical_50/test_001_original.txt \
--compressed benchmarks/llm_tests/100papers_statistical_50/test_001_compressed.txt \
--model claude-3-5-sonnet
# Expected results based on our validation:
# - Claude Sonnet: 91% quality, 50% savings
# - Grok-4: 93% quality, 50% savings
# - GPT-5: 89% quality, 50% savings
Customize scoring weights for your use case:
let config = StatisticalFilterConfig {
compression_ratio: 0.5,
idf_weight: 0.3, // Rare word importance (default: 0.3)
position_weight: 0.2, // Start/end prioritization (default: 0.2)
pos_weight: 0.2, // Content word importance (default: 0.2)
entity_weight: 0.2, // Named entity importance (default: 0.2)
entropy_weight: 0.1, // Vocabulary diversity (default: 0.1)
};
See ROADMAP.md for planned features.
MIT
| Your LLM | Recommended Config | Quality | Savings | Why |
|---|---|---|---|---|
| Grok-4 | statistical_50 | 93% | 50% | Best overall |
| Claude Sonnet | statistical_50 | 91% | 50% | Best cost-benefit β |
| GPT-5 | statistical_50 | 89% | 50% | Good balance |
| Gemini Pro | statistical_50 | 89% | 50% | Production ready |
| Claude Haiku | statistical_70 | 94% | 30% | Needs structure |
| Grok | statistical_70 | 95% | 30% | Conservative |
Don't know which to choose? β Use Claude Sonnet + statistical_50 for the best cost-benefit ratio.