primitives

Crates.io	primitives
lib.rs	primitives
version	0.1.0
created_at	2025-05-11 14:57:11.276619+00
updated_at	2025-05-11 14:57:11.276619+00
description	Primitves Asm
homepage
repository
max_upload_size
id	1669467
size	170,758

malik (malik672)

documentation

README

Experimentation to Optimize Alloy Primitives

(honestly idk what I'm doing, but you never know what you are doing until you try anyway, so who cares) don't tell anyone but this is just an excuse for me to learn simd stuff(idk what it's)

What's This All About?

This project is a wild ride into the world of primitive optimization. We're throwing ideas at the wall and seeing what sticks.

Design Choice

The Grand Plan (or lack thereof)

Early support for arm neon
Take a look at Alloy primitives
Scratch our heads and wonder how to make them faster
Remember that SIMD exists and seems cool
Try to apply SIMD to... something. Anything, really.
See what happens and hope for the best
Implement Vector operations

How to Join This Chaos

Clone this repo
Install... things. (We'll figure out what exactly later)
Run some code and see if it explodes
If it doesn't explode, check if it's faster
If it's faster, celebrate! If not, pretend you meant to do that

Results So Far

Benchmark Comparison

Primitive	SIMD(Primitives)	Alloy Primitives	Performance Change
address/checksum	169.43 ns	201.41 ns	Faster by ~23%
bytes/32	13.818 ns	15.818 ns	Faster by ~14%
bytes/64	14.614 ns	17.667 ns	Faster by ~21%
bytes/128	36.106 ns	36.859 ns	Slightly faster by ~2%
bytes/256	42.191 ns	41.024 ns	Slower by ~2.8%

SIMD Results for Parity Inversion

We've recently added SIMD optimizations for Parity inversion. Here are the benchmark results:

Input Size	Alloy Primitives	SIMD Version	Performance Change
10	21.235 ns	23.258 ns	Slower by ~9.53%
100	101.93 ns	85.767 ns	Faster by ~15%
1000	974.07 ns	722.84 ns	Faster by ~25%
10000	10.090 μs	8.007 μs	Faster by ~20%

Key observations:

SIMD shows overhead for very small inputs(stick with the non simd version)
Performance gains become significant for larger inputs (1000+)
Consistent ~20-25% improvement for large inputs

Contribute

Got ideas? Throw them in! Know what you're doing? Even better, we could use the help!

License

MIT

Commit count: 0