| Crates.io | dtxt-detect |
| lib.rs | dtxt-detect |
| version | 1.0.0 |
| created_at | 2025-09-11 16:42:47.821597+00 |
| updated_at | 2025-09-11 16:42:47.821597+00 |
| description | Rust library for dangerous text detection, optimized for high speeds. |
| homepage | https://github.com/gi-dellav/dtxt-detect |
| repository | https://github.com/gi-dellav/dtxt-detect |
| max_upload_size | |
| id | 1834049 |
| size | 12,119 |
Novel alghorithm and Rust implementation for dangerous text detection, optimized for high speeds, such as LLM inference or dataset evaluation.
dtxt-detect works by splitting the set of key pattern in three tiers:
build, how to)bomb, violence)By default, dtxt-detect works by normalizing the input string (in order to avoid Unicode-based exploits), counting the amounts of tier 1, 2 and 3 patterns found and, if there are at least one tier 1 and one tier 2 pattern within a certain distance (q value) or at least one tier 3 pattern, the text is flagged as dangerous; parameters like the amount of patterns to be found in order to flag as dangerous or conditions that have to be met can be configured.
The current Rust implementation uses SIMD-enabled alghorithms for pattern matching, allows to build pattern lists one time for multiple executions, allows to check amount of fails (flagged as dangerous) and warnings (flagged as potentially dangerous) and allows to set what conditions should trigger fails.
The current Rust implementation misses the support for correlating tier 1 and tier 2 pattern only withing a certain distance (as of right now, any tier 1 and tier 2 patterns trigger the flagging); this means that dtxt-detect will struggle with large text chuncks, but we are already planning to implement said feature.