Crates.io | simd_aligned |
lib.rs | simd_aligned |
version | 0.6.1 |
created_at | 2018-08-09 09:10:05.18335+00 |
updated_at | 2024-12-14 14:49:27.331809+00 |
description | Safe and fast SIMD-aligned data structures with easy and transparent 'flat' access. |
homepage | |
repository | https://github.com/ralfbiedert/simd_aligned_rust |
max_upload_size | |
id | 78480 |
size | 33,654 |
You want to use safe SIMD datatypes from wide
but realized there is no simple, safe and fast way to align your f32x4
(and friends) in memory and treat them as regular f32
slices for easy loading and manipulation; simd_aligned
to the rescue.
wide
for easy data handlingu8x16
to f64x4
&[f32]
), but get performance of properly aligned SIMD vectors (&[f32x4]
)VecSimd
and NxM-dimensional MatSimd
.Produces a vector that can hold 10
elements of type f64
. All elements are guaranteed to be properly aligned for fast access.
use simd_aligned::*;
// Create vectors of `10` f64 elements with value `0.0`.
let mut v1 = VecSimd::<f64x4>::with(0.0, 10);
let mut v2 = VecSimd::<f64x4>::with(0.0, 10);
// Get "flat", mutable view of the vector, and set individual elements:
let v1_m = v1.flat_mut();
let v2_m = v2.flat_mut();
// Set some elements on v1
v1_m[0] = 0.0;
v1_m[4] = 4.0;
v1_m[8] = 8.0;
// Set some others on v2
v2_m[1] = 0.0;
v2_m[5] = 5.0;
v2_m[9] = 9.0;
let mut sum = f64x4::splat(0.0);
// Eventually, do something with the actual SIMD types. Does
// `std::simd` vector math, e.g., f64x8 + f64x8 in one operation:
sum = v1[0] + v2[0];
There is no performance penalty for using simd_aligned
, while retaining all the
simplicity of handling flat arrays.
test vectors::packed ... bench: 77 ns/iter (+/- 4)
test vectors::scalar ... bench: 1,177 ns/iter (+/- 464)
test vectors::simd_aligned ... bench: 71 ns/iter (+/- 5)
std::simd
?simd_aligned
builds on top of std::simd
. At aims to provide common, SIMD-aligned
data structure that support simple and safe scalar access patterns.
faster
(as of today) is good if you already have exiting flat slices in your code
and want to operate them "full SIMD ahead". However, in particular when dealing with multiple
slices at the same time (e.g., kernel computations) the performance impact of unaligned arrays can
become a bit more noticeable (e.g., in the case of ffsvm up to 10% - 20%).