| Crates.io | sweepga |
| lib.rs | sweepga |
| version | 0.1.1 |
| created_at | 2025-11-05 21:07:47.279779+00 |
| updated_at | 2025-11-07 00:24:53.713775+00 |
| description | Efficient pangenome alignment filtering and sparsification tool |
| homepage | |
| repository | https://github.com/pangenome/sweepga |
| max_upload_size | |
| id | 1918638 |
| size | 820,062 |
Fast genome alignment with plane sweep filtering. Wraps FastGA aligner and applies plane sweep filtering to keep the best non-overlapping alignments.
SweepGA can either:
By default, it applies 1:1 plane sweep filtering to keep the single best mapping per query-target chromosome pair.
This package includes two binaries:
sweepga - Genome alignment and filtering toolalnstats - Alignment statistics and validation toolUse alnstats to verify filtering results:
# Show statistics for a PAF file
alnstats alignments.paf
# Compare before/after filtering
alnstats raw.paf filtered.paf
# Detailed per-genome-pair breakdown
alnstats alignments.paf -d
cargo install sweepga
This installs both sweepga and alnstats binaries from the published crate.
Requires Rust 1.70+. Clone and install:
git clone https://github.com/pangenome/sweepga.git
cd sweepga
cargo install --force --path .
Symptoms: Build fails with linker errors like:
ld: /usr/lib/x86_64-linux-gnu/librt.so: undefined reference to '__pthread_barrier_wait@GLIBC_PRIVATE'
This occurs on systems with multiple package managers (e.g., Debian + Guix) providing different glibc versions.
Fix: Use the clean build script to isolate from environment conflicts:
./scripts/build-clean.sh --install
See docs/BUILD-NOTES.md for details.
Adapted from https://issues.genenetwork.org/topics/rust/guix-rust-bootstrap:
# Update Guix
mkdir -p $HOME/opt
guix pull -p $HOME/opt/guix-pull-20251012 --url=https://codeberg.org/guix/guix
# Be sure to use the updated Guix
alias guix=$HOME/opt/guix-pull-20251012/bin/guix
# Update Rust and Cargo
mkdir -p ~/.cargo ~/.rustup # to prevent rebuilds
guix shell --share=$HOME/.cargo --share=$HOME/.rustup -C -N -D -F -v 3 guix gcc-toolchain make libdeflate pkg-config xz coreutils sed zstd zlib nss-certs openssl curl
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
. ~/.cargo/env
rustup default stable
exit
# Clone the repository
git clone https://github.com/pangenome/sweepga.git
cd sweepga
guix shell --share=$HOME/.cargo --share=$HOME/.rustup -C -N -D -F -v 3 guix gcc-toolchain make libdeflate pkg-config xz coreutils sed zstd zlib nss-certs openssl curl cmake clang # we need cmake and clang too for building
. ~/.cargo/env
export LD_LIBRARY_PATH=$GUIX_ENVIRONMENT/lib
cargo build --release
# Check the lib path and put it into your ~/.bashrc or ~/.zshrc
echo $GUIX_ENVIRONMENT/
#/gnu/store/whgjblccmr4kdmsi4vg8h0p53m5f7sch-profile/
exit
echo "export GUIX_ENVIRONMENT=/gnu/store/whgjblccmr4kdmsi4vg8h0p53m5f7sch-profile/" >> ~/.bashrc # or ~/.zshrc
source ~/.bashrc # or ~/.zshrc
# Use the executable in sweepga/target/release
env LD_LIBRARY_PATH=$GUIX_ENVIRONMENT/lib ./target/release/sweepga --help
# Self-alignment with 1:1 filtering
sweepga genome.fa.gz > output.paf
# Pairwise alignment (target, query order)
sweepga target.fa query.fa > output.paf
# With 2 threads
sweepga genome.fa.gz -t 2 > output.paf
# Default: 1:1 plane sweep filtering
cat alignments.paf | sweepga > filtered.paf
# Keep best mapping per query only (1:∞)
cat alignments.paf | sweepga -n 1 > filtered.paf
# No filtering, just pass through
cat alignments.paf | sweepga -n many > output.paf
# Read from file instead of stdin
sweepga alignments.paf > filtered.paf
# Direct alignment and filtering in one step
sweepga data/scerevisiae8.fa.gz > scerevisiae8.paf
# Result: ~26K mappings (1:1 filtered)
# - Each genome pair gets best alignment per chromosome pair
# - Self-mappings excluded by default (use --self to include)
-n/--num-mappings - n:m-best mappings in query:target dimensions (default: 1:1)
"1:1" - Orthogonal: keep best mapping on both query and target axes"1" - Keep best mapping per query position only"many" - No filtering, keep all mappings"n:m" - Keep top n per query, top m per target (use ∞/many for unbounded)-o/--overlap - Maximum overlap ratio (default: 0.95)
-l/--min-block-length - Minimum alignment block length (default: 0)
-i/--min-identity - Minimum identity threshold (0-1 fraction, 1-100%, or "aniN")
-t/--threads - Number of threads (default: 8)
--self - Include self-mappings (excluded by default)
-f/--no-filter - Disable all filtering
The plane sweep algorithm operates per query-target chromosome pair:
identity × log(block_length) (matches wfmash)-n setting:
1:1: Keep single best mapping per position on both query and target1: Keep best mapping per query position (multiple targets allowed)many: Keep all non-overlapping mappings-o)SweepGA: Fast plane sweep filtering for genome alignments https://github.com/pangenome/sweepga