dtxt-detect

Crates.iodtxt-detect
lib.rsdtxt-detect
version1.0.0
created_at2025-09-11 16:42:47.821597+00
updated_at2025-09-11 16:42:47.821597+00
descriptionRust library for dangerous text detection, optimized for high speeds.
homepagehttps://github.com/gi-dellav/dtxt-detect
repositoryhttps://github.com/gi-dellav/dtxt-detect
max_upload_size
id1834049
size12,119
(gi-dellav)

documentation

https://docs.rs/dtxt-detect

README

dtxt-detect

Novel alghorithm and Rust implementation for dangerous text detection, optimized for high speeds, such as LLM inference or dataset evaluation.

How does it work

dtxt-detect works by splitting the set of key pattern in three tiers:

  • Tier 1, for patterns that can be used to build dangerous text (ex. build, how to)
  • Tier 2, for patterns that are extremely common in dangerous text (ex. bomb, violence)
  • Tier 3, for patterns that always define dangerous text (ex. racial slurs)

By default, dtxt-detect works by normalizing the input string (in order to avoid Unicode-based exploits), counting the amounts of tier 1, 2 and 3 patterns found and, if there are at least one tier 1 and one tier 2 pattern within a certain distance (q value) or at least one tier 3 pattern, the text is flagged as dangerous; parameters like the amount of patterns to be found in order to flag as dangerous or conditions that have to be met can be configured.

Rust implementation

The current Rust implementation uses SIMD-enabled alghorithms for pattern matching, allows to build pattern lists one time for multiple executions, allows to check amount of fails (flagged as dangerous) and warnings (flagged as potentially dangerous) and allows to set what conditions should trigger fails.

The current Rust implementation misses the support for correlating tier 1 and tier 2 pattern only withing a certain distance (as of right now, any tier 1 and tier 2 patterns trigger the flagging); this means that dtxt-detect will struggle with large text chuncks, but we are already planning to implement said feature.

Future milestones

  • Implement q-distance in the Rust library
  • Export and publish a Python package
  • Make a ready-to-use dataset for dtxt-detect
Commit count: 12

cargo fmt