dtxt-detect

Crates.io	dtxt-detect
lib.rs	dtxt-detect
version	1.0.0
created_at	2025-09-11 16:42:47.821597+00
updated_at	2025-09-11 16:42:47.821597+00
description	Rust library for dangerous text detection, optimized for high speeds.
homepage	https://github.com/gi-dellav/dtxt-detect
repository	https://github.com/gi-dellav/dtxt-detect
max_upload_size
id	1834049
size	12,119

(gi-dellav)

documentation

https://docs.rs/dtxt-detect

README

dtxt-detect

Novel alghorithm and Rust implementation for dangerous text detection, optimized for high speeds, such as LLM inference or dataset evaluation.

How does it work

dtxt-detect works by splitting the set of key pattern in three tiers:

Tier 1, for patterns that can be used to build dangerous text (ex. build, how to)
Tier 2, for patterns that are extremely common in dangerous text (ex. bomb, violence)
Tier 3, for patterns that always define dangerous text (ex. racial slurs)

By default, dtxt-detect works by normalizing the input string (in order to avoid Unicode-based exploits), counting the amounts of tier 1, 2 and 3 patterns found and, if there are at least one tier 1 and one tier 2 pattern within a certain distance (q value) or at least one tier 3 pattern, the text is flagged as dangerous; parameters like the amount of patterns to be found in order to flag as dangerous or conditions that have to be met can be configured.

Rust implementation

The current Rust implementation uses SIMD-enabled alghorithms for pattern matching, allows to build pattern lists one time for multiple executions, allows to check amount of fails (flagged as dangerous) and warnings (flagged as potentially dangerous) and allows to set what conditions should trigger fails.

The current Rust implementation misses the support for correlating tier 1 and tier 2 pattern only withing a certain distance (as of right now, any tier 1 and tier 2 patterns trigger the flagging); this means that dtxt-detect will struggle with large text chuncks, but we are already planning to implement said feature.

Future milestones

Implement q-distance in the Rust library
Export and publish a Python package
Make a ready-to-use dataset for dtxt-detect

Commit count: 12

dtxt-detect

documentation

README

dtxt-detect

How does it work

Rust implementation

Future milestones

cargo fmt