Crates.io | zalgo-codec |
lib.rs | zalgo-codec |
version | 0.13.1 |
source | src |
created_at | 2022-11-17 15:31:58.864162 |
updated_at | 2024-11-23 18:13:52.589561 |
description | Convert an ASCII text string into a single unicode grapheme cluster and back. Provides a macro for embedding Rust source code that has been encoded in this way. |
homepage | |
repository | https://github.com/JSorngard/zalgo_codec/tree/main/codec |
max_upload_size | |
id | 717329 |
size | 155,206 |
This crate lets you convert an ASCII text string into a single unicode grapheme cluster and back. It also provides a procedural macro that lets you take source code that's been converted into such a grapheme cluster and compile it as if it was never zalgo-ified. This lets you reach new lows in the field of self-documenting code.
The encoded string will be ~2 times larger than the original in terms of bytes.
Additionally the crate provides a function to encode Python code and wrap the result in a decoder that decodes and executes it such that the result retains the functionality of the original code.
Encode a string to a grapheme cluster with zalgo_encode
:
let s = "Zalgo";
let encoded = zalgo_encode(s)?;
assert_eq!(encoded, "É̺͇͌͏");
Decode a grapheme cluster back into a string with zalgo_decode
:
let encoded = "É̺͇͌͏";
let s = zalgo_decode(encoded)?;
assert_eq!(s, "Zalgo");
The ZalgoString
type can be used to encode a string and handle the result in
various ways:
let s = "Zalgo";
let zstr = ZalgoString::new(s)?;
assert_eq!(zstr, "É̺͇͌͏");
assert_eq!(zstr.len(), 2 * s.len() + 1);
assert_eq!(zstr.decoded_len(), s.len());
assert_eq!(zstr.bytes().next(), Some(69));
assert_eq!(zstr.decoded_chars().next_back(), Some('o'));
We can execute zalgo encoded rust code with the macro zalgo_embed!
:
// This expands to the code
// `fn add(x: i32, y: i32) -> i32 {x + y}`
zalgo_embed!("E͎͉͙͉̞͉͙͆̀́̈́̈́̈̀̓̒̌̀̀̓̒̉̀̍̀̓̒̀͛̀̋̀͘̚̚͘͝");
// The `add` function is now available
assert_eq!(add(10, 20), 30);
as well as evaluate expressions:
let x = 20;
let y = -10;
// This expands to the code
// `x + y`
let z = zalgo_embed!("È͙̋̀͘");
assert_eq!(z, x + y);
We can also do the opposite of obfstr
: obfuscate
a string while coding and deobfuscate it during compile time
let secret_string = zalgo_embed!("Ê̤͏͎͔͔͈͉͓͍̇̀͒́̈́̀̀ͅ͏͍́̂");
assert_eq!(secret_string, "Don't read this mom!");
The cursed character at the bottom of this section is the standard "Lorem ipsum" encoded with the encoding function in this crate.
Characters U+0300–U+036F are the combining characters for unicode Latin.
The fun thing about combining characters is that you can add as many of these
characters as you like to the original character and it does not create any new symbols,
it only adds symbols on top of the character. It's supposed to be used in order
to create characters such as á
by taking a normal a
and adding another
character to give it the mark (U+301, in this case).
Fun fact,Unicode doesn't specify any limit on the number of these characters.
Conveniently, this gives us 112 different characters we can map to,
which nicely maps to the ASCII character range 0x20 -> 0x7F,
aka all the non-control characters. The only issue is that we can't have new lines
in this system, so to fix that, we can simply map 0x7F (DEL) to 0x0A (LF).
This can be represented as (CHARACTER - 11) % 133 - 21
, and decoded with
(CHARACTER + 22) % 133 + 10
.
There is an executable available for experimenting with the codec on text and files.
It can also be used to generate grapheme clusters from source code for use with zalgo_embed!
.
It can be installed with cargo install zalgo-codec --features binary
.
You can optionally enable the gui
feature during installation to include a
rudimentary GUI mode for the program.
The crate is based on the encoding and decoding functions originally written in Python by Scott Conner. They were first presented in this post together with the above explanation.