Crates.io | sigmah |
lib.rs | sigmah |
version | |
source | src |
created_at | 2024-10-30 11:01:02.633239 |
updated_at | 2024-11-02 02:53:29.216275 |
description | Create and scan binary signature in Rust efficiently |
homepage | |
repository | https://github.com/stevefan1999-personal/sigmah |
max_upload_size | |
id | 1428376 |
Cargo.toml error: | TOML parse error at line 18, column 1 | 18 | autolib = false | ^^^^^^^ unknown field `autolib`, expected one of `name`, `version`, `edition`, `authors`, `description`, `readme`, `license`, `repository`, `homepage`, `documentation`, `build`, `resolver`, `links`, `default-run`, `default_dash_run`, `rust-version`, `rust_dash_version`, `rust_version`, `license-file`, `license_dash_file`, `license_file`, `licenseFile`, `license_capital_file`, `forced-target`, `forced_dash_target`, `autobins`, `autotests`, `autoexamples`, `autobenches`, `publish`, `metadata`, `keywords`, `categories`, `exclude`, `include` |
size | 0 |
A blatantly simple Rust crate that creates binary signature efficiently, which can be stored on stack or as const item (So can store the pattern without making it on heap!).
It also generate scanner function using either best-effort algorithm chosen from naive linear scanning, or using runtime SIMD selection as backend if feature is available, which also includes a scalar fallback (Thanks you portable SIMD <3).
The generated function is internalized, meaning the signature itself would embed directly as part of the opcode without much preprocessing.
This should directly kill many Find Pattern benchmark while maintaining a decent performance.
By stuffing the mask into a bit array with precise byte storage, with each mask encoded directly at the bit level. This can be considered a bit hack, given that we hard coded the condition for:
0 => Optional/don't care => Result = 1
1 => Match => Pattern data at current position should equal current data at current position
Thus, for that we want to compute the following:
pat: [1, 2, 3, 4]
data: [1, 1, 3, 3]
mask: [1, 0, 1, 1] # 1 indicates equal, 0 indicates optional/don't care
result: [1, 1, 1, 0]
For example, given a pattern of "\x01\x02\x03x\04\x05\x06" and mask of "xx??x?", this means we transform the mask to 0b110010 (which can be done in Rust constant evaluation with compiler fuckery), aligned to the closest byte-boundary, which is (6+7)//8 = 1 byte (or use euclidean division towards positive infinity), then we just append it back to the pattern as-is, which makes it insanely more compact than typical byte string based pattern. Notice now endianess do matter here, so you need to either declare either LSB-0 (little endian) or MSB-0 (big endian) if you want to deliver it over the network boundary.
If you checked the pattern in IDA, you notice the bytes and comparison operations are perfectly reversed -- this is totally normal because you are most likely running in a little endian system known as x86, dumbass.
By appending the binary mask back to the pattern, we achieve a more compact representation. Instead of storing both the pattern and mask in full, this method reduces the storage requirement to just the pattern length plus the mask overhead. For a 16-byte pattern, the mask overhead would be only 2 bytes (16/8), significantly less than the typical 2N bytes where each byte of the pattern and mask is stored separately. This is much more efficient, although quite harder to make since there is a lot of brain damage required here to make this happen with const at the same time.
Given the transitional matrix above, so this is the first algorithm that comes to mind (in pseudocode):
for i in range(len(pat)):
result[i] = mask[i] ? pat[i] == data[i] : 1
all(matches(x, true) for x in result)
And with a little bit of wit, we can turn this ternary into a binary logical operation:
for i in range(len(pat)):
result[i] = !mask[i] || pat[i] == data[i]
all(matches(x, true) for x in result)
If you want to argue further:
for i in range(len(pat)):
result[i] = mask[i] "IMPLIES" (pat[i] == data[i])
all(matches(x, true) for x in result)
Here, "IMPLIES" represents logical implication (A implies B is true unless A is true and B is false). This logical operation simplifies the check but introduces complexity in implementation, but there is likely no way to encode IMPLIES in SIMD at the moment. Not that I'm aware enough to know if one exists in the wild.
VPTERNLOGD
could be handy here, so I left the OG function available as well if you like to experiment with it. At the moment, it is generating the same code to the binary logical one, i.e. also automatically optimized. Until LLVM team fully juiced out their brain power and figured out how to encode VPTERNLOGD, take that as something you shouldn't use in general. (though why don't you just fetch it from the AVX512 mask register at this point?).
Currently, the Rust programming language feature known as generic_const_exprs
plays a critical role, especially for scenarios involving precise byte storage. This feature allows for the use of generic const expressions, which are essential when you need to define types where the size in bytes must be exactly specified. However, this capability is still incomplete, leading to several limitations and challenges:
Incompleteness of generic_const_exprs
:
generic_const_exprs
is an experimental feature in the Rust nightly builds. It enables developers to use expressions in generic parameters, which is vital for precise byte alignment in data structures.Workarounds and Their Drawbacks:
generic_const_exprs
involves using macros to generate the necessary code. Macros can create structures with predefined byte sizes, but this approach:
u32
or u64
. For instance:
u32
.u64
.Current Compatibility:
cargo 1.84.0-nightly
from October 25, 2024. This indicates a temporary robustness in the feature's implementation but does not address its long-term stability.Future Outlook:
generic_const_exprs
feature has seen little progress over the years, raising concerns about its maintenance and future compatibility. Without active development, there's a risk of "bit rot", where the code might become outdated or fail to integrate with new language features or standards.min_generic_const_exprs
or similar proposals. These could offer a more stable and safer approach to const generics:
generic_const_exprs
in favor of more robust solutions.I'm keenly watching for advancements that could either enhance this feature or replace it with something more fitting for long-term use.
SIMD backend requires nightly for portable_simd
(obviously), as well as avx512_target_feature
too for a nifty AVX512BW hack:
bool __fastcall sub_140001C86(__int64 a1, __int64 a2, __int64 _R8)
{
bool result; // al
__asm
{
vmovdqu64 zmm0, zmmword ptr [r8]
vpcmpb k0, zmm0, zmmword ptr [rcx], 4
kmovq rax, k0
}
result = (a2 & _RAX) == 0;
__asm { vzeroupper }
return result;
}
Notice the vmovdqu64, vpcmpb and kmovq. All of which requires more than AVX512F, since this directly reads from the mask register, and hence should be killing on the competition at an insane speed due to its compact and direct design. But this is gated behind avx512_target_feature
at the moment, and it is expected to be stabilized soon in the next half of 2025. AVX2 code generated has variable length, which is due with different signature size, so it is not shown as snippet here, but I strongly suggest you to check out the result in IDA as well.
TODO
TODO