unimorph-core

Crates.iounimorph-core
lib.rsunimorph-core
version0.2.1
created_at2026-01-06 02:59:45.545992+00
updated_at2026-01-19 06:51:33.445207+00
descriptionCore library for UniMorph morphological data
homepage
repositoryhttps://github.com/joshrotenberg/unimorph-rs
max_upload_size
id2025089
size219,827
Josh Rotenberg (joshrotenberg)

documentation

README

unimorph-rs

Crates.io Documentation License

A Rust toolkit for working with UniMorph morphological data.

What is UniMorph?

UniMorph provides morphological paradigm data for 180+ languages in a unified annotation format. Each entry is a triple of lemma, inflected form, and morphological features:

lemma       form        features
hablar      hablo       V;IND;PRS;1;SG
hablar      hablado     V;V.PTCP;PST;MASC;SG
ser         soy         V;IND;PRS;1;SG

Installation

Homebrew (macOS/Linux)

brew tap joshrotenberg/brew
brew install unimorph

Cargo

cargo install unimorph

Docker

docker pull ghcr.io/joshrotenberg/unimorph-rs:latest

# Run with persistent data cache
docker run -v ~/.cache/unimorph:/data ghcr.io/joshrotenberg/unimorph-rs download spa
docker run -v ~/.cache/unimorph:/data ghcr.io/joshrotenberg/unimorph-rs inflect spa hablar

From source

git clone https://github.com/joshrotenberg/unimorph-rs
cd unimorph-rs
cargo install --path crates/unimorph-cli  # directory still named unimorph-cli

Features

  • Automatic downloads from UniMorph GitHub repositories
  • Transparent decompression of .xz, .gz, and .zip files (some large datasets are compressed)
  • SQLite storage for fast local queries
  • Multiple export formats: TSV, JSON Lines, Parquet
  • Python bindings via PyO3

Quick Start

# Download Spanish dataset
unimorph download spa

# Look up all forms of a verb
unimorph inflect -l spa hablar

# Analyze a surface form (reverse lookup)
unimorph analyze -l spa hablo

# Search with filters
unimorph search -l spa --lemma "habl%" --contains V,IND

# Dataset statistics
unimorph stats spa

# Export to JSON Lines
unimorph export spa -F jsonl -o spanish.jsonl

Library Usage

use unimorph_core::{Store, Repository, LangCode};

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Download dataset if needed
    let repo = Repository::new()?;
    let lang: LangCode = "spa".parse()?;
    repo.ensure_dataset(&lang).await?;

    // Query the data
    let store = repo.store()?;
    
    // Get all forms of a lemma
    for entry in store.inflect(&lang, "hablar")? {
        println!("{} -> {} [{}]", entry.lemma, entry.form, entry.features);
    }

    // Reverse lookup: find lemmas for a surface form
    for entry in store.analyze(&lang, "hablo")? {
        println!("{} <- {} [{}]", entry.form, entry.lemma, entry.features);
    }

    Ok(())
}

Documentation

Full documentation is available at joshrotenberg.github.io/unimorph-rs, including:

Python Bindings

pip install unimorph-rs
from unimorph import Store, download

download("ita")
store = Store()

for entry in store.inflect("ita", "parlare"):
    print(f"{entry.form}: {entry.features}")

See the Python documentation for more details.

Project Structure

unimorph-rs/
├── crates/
│   ├── unimorph-core/   # Core library: types, SQLite store, repository
│   ├── unimorph-cli/    # Command-line interface
│   └── unimorph-python/ # Python bindings (PyO3)
└── docs/                # mdBook documentation

References

License

Apache-2.0

Commit count: 91

cargo fmt