sakurs-cli

Crates.iosakurs-cli
lib.rssakurs-cli
version0.1.1
created_at2025-07-27 14:02:11.556415+00
updated_at2025-07-27 15:29:50.751091+00
descriptionCommand-line interface for Sakurs sentence boundary detection
homepagehttps://github.com/sog4be/sakurs
repositoryhttps://github.com/sog4be/sakurs
max_upload_size
id1770106
size89,338
(sog4be)

documentation

https://docs.rs/sakurs-cli

README

sakurs-cli

Fast, parallel sentence boundary detection for the command line.

Table of Contents

Installation

cargo install sakurs-cli

After installation, the sakurs command will be available in your PATH.

Quick Start

# Process text files
sakurs process -i document.txt

# Process multiple files with glob pattern
sakurs process -i "*.txt"

# Process from stdin
echo "Hello world. How are you?" | sakurs process -i -

# Output as JSON
sakurs process -i document.txt -f json

Features

  • Parallel Processing: Automatically utilizes multiple CPU cores for optimal performance
  • Multiple Output Formats: Plain text, JSON, or quiet mode for different use cases
  • Language Support: Built-in configurations for English and Japanese

Usage Examples

Basic File Processing

# Process a single file
sakurs process -i report.txt

# Process with specific language
sakurs process -i japanese_text.txt -l japanese

Batch Processing

# Process all text files in a directory
sakurs process -i "documents/*.txt"

# Recursive processing with complex patterns
sakurs process -i "**/*.{txt,md}"

Output Formats

# Default format (human-readable)
sakurs process -i file.txt

# JSON format for programmatic use
sakurs process -i file.txt -f json

# Quiet mode (only sentence count)
sakurs process -i file.txt -f quiet

Performance Tuning

For large files, you can tune performance:

# Use 8 threads with 1MB chunks
sakurs process -i large_file.txt --threads 8 --chunk-kb 1024

# Sequential processing (useful for debugging)
sakurs process -i file.txt --sequential

Command Reference

sakurs process [OPTIONS]

OPTIONS:
    -i, --input <INPUT>           Input file(s) or '-' for stdin
    -o, --output <OUTPUT>         Output file (default: stdout)
    -f, --format <FORMAT>         Output format [default: text]
                                  [possible values: text, json, quiet]
    -l, --language <LANGUAGE>     Language for sentence detection [default: en]
                                  [possible values: en, ja, english, japanese]
    --sequential                  Force sequential processing
    --parallel                    Force parallel processing (default: auto)
    --threads <N>                 Number of threads (default: CPU count)
    --chunk-kb <SIZE>             Chunk size in KB [default: 256]
    -h, --help                    Print help
    -V, --version                 Print version

Examples

Processing Japanese Text

sakurs process -i japanese_novel.txt -l ja -f json > sentences.json

Analyzing Code Documentation

# Extract sentences from all README files
sakurs process -i "**/README.md" -f quiet

Pipeline Integration

# Count sentences in git commit messages
git log --format=%B | sakurs process -i - -f quiet

# Extract sentences from specific files
find . -name "*.txt" -exec sakurs process -i {} \;

License

MIT License. See LICENSE for details.

Links

Commit count: 0

cargo fmt