is-it-slop-preprocessing

Crates.iois-it-slop-preprocessing
lib.rsis-it-slop-preprocessing
version0.4.0
created_at2025-12-02 21:28:07.020524+00
updated_at2025-12-07 19:04:58.684709+00
descriptionA Rust library and CLI tool to detect AI-generated slop text using machine learning.
homepage
repositoryhttps://github.com/SamBroomy/is-it-slop
max_upload_size
id1962673
size79,858
Broomy (SamBroomy)

documentation

README

is-it-slop

Fast AI text detection using TF-IDF and ensemble classifiers.

Features

  • Fast: Rust-based preprocessing
  • Accurate: 96%+ accuracy (F1 0.96, MCC 0.93)
  • Portable: ONNX model embedded in CLI binary
  • Dual APIs: Rust library + Python bindings

Installation

CLI (Rust)

cargo install is-it-slop --features cli

Model artifacts (16 MB) are downloaded automatically during build from GitHub releases.

Python Package

uv add is-it-slop
# or
pip install is-it-slop

Rust Library

cargo add is-it-slop

Quick Start

CLI

is-it-slop "Your text here"
# Output: 0.234 (AI probability)

is-it-slop "Text" --format class
# Output: 0 (Human) or 1 (AI)

Python

from is_it_slop import is_this_slop
result = is_this_slop("Your text here")
print(result.classification)
>>> 'Human'
print(f"AI probability: {result.ai_probability:.2%}")
>>> AI probability: 15.23%

Rust

use is_it_slop::Predictor;

let predictor = Predictor::new();
let prediction = predictor.predict("Your text here")?;
println!("AI probability: {}", prediction.ai_probability());

Architecture

Training (Python):
  Texts -> RustTfidfVectorizer -> TF-IDF -> sklearn models ->  ONNX

Inference (Rust CLI):
  Texts -> TfidfVectorizer (Rust) -> TF-IDF -> ONNX Runtime -> Prediction

Why separate artifacts?

  • Vectorizer: Fast Rust preprocessing.

Python bindings make it easy to train a model in Python and use it in Rust.

  • Model: Portable ONNX format (no Python runtime needed)

Training

See notebooks/dataset_curation.ipynb for which datasets were used. See notebooks/train.ipynb for training pipeline.

Great care was taken to use multiple diverse datasets to avoid overfitting to any single source of human or AI-generated text. Great care was also taken to avoid the underlying model just learning artifacts of specific datasets.

For more information about look in the notebooks/ directory.

License

MIT

Commit count: 0

cargo fmt