nl-sre-english

Crates.ionl-sre-english
lib.rsnl-sre-english
version0.1.4
created_at2026-01-21 23:19:40.893105+00
updated_at2026-01-22 20:03:54.795149+00
descriptionDeterministic Semantic Disambiguation Engine for English - 1500+ verbs, 5500+ words, Zero dependencies, Pure Rust
homepage
repositoryhttps://github.com/Yatrogenesis/nl-sre-english
max_upload_size
id2060348
size8,942,422
Frank (Yatrogenesis)

documentation

README

NL-SRE-English

Deterministic Semantic Disambiguation Engine for English

DOI Rust Zero Dependencies License: MIT

A comprehensive English verb database with semantic disambiguation capabilities, designed for natural language processing, command parsing, and AI applications.

Features

  • 1,514 English verb entries organized into 25 functional categories
  • Multi-category support - verbs like "run" can belong to Movement AND Control
  • 80+ verb groups for fine-grained classification
  • Complete conjugation system (regular + irregular verbs)
  • Zero dependencies - Pure Rust implementation
  • Spell correction via Levenshtein distance
  • Phonetic matching via Soundex and Metaphone algorithms
  • Natural language command parser for action extraction

Functional Categories

The engine organizes verbs into semantic categories for easy programmatic access:

Category Description Examples
Movement Motion and locomotion walk, run, fly, swim, climb
Perception Sensing and perceiving see, hear, feel, smell, taste
Communication Speaking and speech acts say, tell, speak, ask, answer
Cognition Mental processes think, know, believe, understand
Emotion Emotional states love, hate, fear, hope, enjoy
Physical Physical manipulation hit, cut, push, pull, throw
State States of being be, exist, remain, stay
Change Change of state become, grow, transform
Transfer Giving and receiving give, take, send, receive
Creation Making and producing make, create, build, write
Destruction Breaking and destroying destroy, break, kill, damage
Control Controlling and managing control, manage, lead, govern
Possession Owning and having own, have, possess, acquire
Social Social interaction meet, help, fight, cooperate
Consumption Eating and drinking eat, drink, consume, breathe
Body Bodily functions sleep, wake, sit, stand, lie
Weather Weather phenomena rain, snow, blow, shine
Measurement Measuring and comparing measure, weigh, count, compare
Aspectual Beginning, ending, continuing begin, end, continue, stop
Causation Causing and enabling cause, allow, prevent, force
Attempt Trying and succeeding/failing try, succeed, fail, practice
Modal Modal and semi-modal want, need, can, should
Position Body position and location put, place, set, remove
Connection Joining and separating connect, join, separate, split
Emission Light and sound emission shine, glow, ring, buzz

Quick Start

use nl_sre_english::SemanticDisambiguator;
use nl_sre_english::verbs::FunctionalCategory;

fn main() {
    let disambiguator = SemanticDisambiguator::new();

    // Process a sentence
    let result = disambiguator.process("The cat runs quickly across the room");

    println!("Detected actions:");
    for action in &result.detected_actions {
        println!("  - {} (base: {}, category: {})",
            action.verb,
            action.base_form,
            action.category.name()
        );
    }

    // Get verbs by category
    let movement_verbs = disambiguator.verbs_by_category(FunctionalCategory::Movement);
    println!("Movement verbs: {:?}", &movement_verbs[..5]);
}

Verb Database Usage

use nl_sre_english::verbs::{VerbDatabase, FunctionalCategory, VerbGroup};

fn main() {
    let db = VerbDatabase::with_builtin();

    // Look up any verb form
    if let Some(entry) = db.lookup("running") {
        println!("Base form: {}", entry.base);           // "run"
        println!("Category: {}", entry.category.name()); // "Movement"
        println!("Group: {}", entry.group.name());       // "Run"
        println!("Irregular: {}", entry.irregular);      // true
        println!("Forms: {} / {} / {} / {}",
            entry.base,
            entry.past,
            entry.past_participle,
            entry.present_participle
        );
    }

    // Get all verbs in a category
    let emotions = db.by_category(FunctionalCategory::Emotion);
    for verb in emotions.iter().take(10) {
        println!("{}: {}", verb.base, verb.synonyms.join(", "));
    }

    // Get all verbs in a specific group
    let running_verbs = db.by_group(VerbGroup::Run);
    // Returns: run, sprint, dash, race, jog, rush, hurry, bolt...
}

Command Parser

use nl_sre_english::command_parser::CommandParser;

fn main() {
    let mut parser = CommandParser::new();

    // Parse natural language commands
    if let Some(cmd) = parser.parse("please walk to the store") {
        println!("Action: {}", cmd.action);       // "walk"
        println!("Category: {}", cmd.category.name()); // "Movement"
        println!("Subject: {:?}", cmd.subject);   // Some("please")
        println!("Object: {:?}", cmd.object);     // Some("to the store")
    }

    // Parse multiple commands
    let commands = parser.parse_all("Run to the store. Buy some milk. Come back home.");
    // Returns 3 parsed commands
}

Database Statistics

Total verb entries: 1,514 (including multi-category)
Unique verb base forms: 1,312
Irregular verbs: 133
Regular verbs: 1,381
Total forms indexed: ~5,300
Functional categories: 25
Verb groups: 80+
Dictionary words: 4,971 (COCA corpus)

Performance (Intel i7-12650H, 16 GB RAM)

Operation Throughput Latency
Verb lookup 12.2M ops/sec 0.08 µs
Spell correction (BK-Tree) 1.2K ops/sec 804 µs
Command parsing 918K ops/sec 1.1 µs
Contraction expansion 4.6M ops/sec 0.22 µs

BK-Tree speedup: 1.6x over linear search for fuzzy matching.

Multi-Category Verbs

Some verbs belong to multiple semantic categories depending on context:

use nl_sre_english::VerbDatabase;

let db = VerbDatabase::with_builtin();

// "run" has multiple meanings
let categories = db.get_all_categories("run");
// Returns: [Movement, Control]
// - "run to the store" -> Movement
// - "run a company" -> Control

// Get all entries for a verb
let entries = db.lookup_all("run");
// Returns Vec with 2 entries, each with different category

Architecture

┌─────────────────────────────────────────────────────┐
│                  SemanticDisambiguator              │
├─────────────────────────────────────────────────────┤
│  ┌──────────────┐  ┌────────────┐  ┌────────────┐  │
│  │ VerbDatabase │  │  Grammar   │  │ Dictionary │  │
│  │  1312 verbs  │  │   Parser   │  │   5500+    │  │
│  │  25 categories│  │   POS tag  │  │   words    │  │
│  └──────────────┘  └────────────┘  └────────────┘  │
├─────────────────────────────────────────────────────┤
│              CommandParser (NL → Structured)        │
└─────────────────────────────────────────────────────┘

Building

cargo build --release

Running Examples

cargo run --example verb_groups

Testing

cargo test

Optimizations (v0.1.1)

BK-Tree Fuzzy Search

The spell correction system has been optimized with a Burkhard-Keller Tree (BK-Tree) implementation:

  • Before: O(N×M) complexity - compared every word in dictionary
  • After: O(log N × M) average - uses triangle inequality pruning
use nl_sre_english::EnglishDictionary;

let dict = EnglishDictionary::new();
// Fast fuzzy search for spell correction
let suggestions = dict.find_similar("helo", 2);
// Returns: [("hello", 1), ("help", 2), ("held", 2), ...]

Benchmark improvement: ~1.6x faster for typical spell correction queries (4,971 word dictionary).

Contraction Expansion

The tokenizer now automatically expands 50+ common English contractions:

use nl_sre_english::EnglishGrammar;

let grammar = EnglishGrammar::new();

// "don't" expands to "do" + "not"
let tokens = grammar.tokenize("I don't know");
assert_eq!(tokens, vec!["i", "do", "not", "know"]);

// "we'll" expands to "we" + "will"
let tokens = grammar.tokenize("We'll see");
assert_eq!(tokens, vec!["we", "will", "see"]);

Supported contractions:

  • Negative: don't, doesn't, didn't, won't, can't, couldn't, shouldn't, wouldn't, isn't, aren't, wasn't, weren't, haven't, hasn't, hadn't...
  • Pronoun + be: I'm, you're, he's, she's, it's, we're, they're...
  • Pronoun + have: I've, you've, we've, they've, could've, would've, should've...
  • Pronoun + will: I'll, you'll, he'll, she'll, we'll, they'll...
  • Pronoun + would/had: I'd, you'd, he'd, she'd, we'd, they'd...
  • Other: let's, ain't...

Regression Test Suite

Added comprehensive regression tests (tests/regression.rs) covering:

  • Dictionary word validation (100 most common words)
  • Spell correction accuracy
  • Contraction expansion coverage
  • Command parsing for all categories
  • Disambiguator action detection
  • Edge cases and performance benchmarks

Use Cases

  • AI Assistants: Understand user intent and extract actions
  • Chatbots: Parse commands and respond appropriately
  • Game Development: NPC command interpretation
  • Voice Interfaces: Convert speech to structured commands
  • Text Analysis: Action and intent extraction
  • Educational Software: Verb conjugation and grammar tools
  • Robotics: Natural language to robot commands

Scalability

Note on Scalability

This repository demonstrates the logical architecture of a deterministic NLP engine - the algorithmic foundations, data structures, and API design patterns. All components are production-ready and optimized for performance.

For industrial-scale applications requiring:

  • 160K+ word lexicons with WordNet integration
  • Zero-copy memory-mapped binary formats for sub-millisecond loading
  • Word Sense Disambiguation (WSD) with configurable weights
  • Synset hierarchies (hypernyms/hyponyms) for semantic reasoning

Contact Avermex Research Division for enterprise implementation details.

License

MIT License - See LICENSE for details.

Author

Francisco Molina-Burgos Avermex Research Division Merida, Yucatan, Mexico


Part of the NL-SRE (Natural Language Semantic Rule Engine) family

See also: NL-SRE-Semantico (Spanish) DOI

Commit count: 18

cargo fmt