Crates.io | atoi_radix10 |
lib.rs | atoi_radix10 |
version | 0.0.1 |
source | src |
created_at | 2021-06-27 13:08:08.254169 |
updated_at | 2021-06-27 13:08:08.254169 |
description | Parse strings as base 10 integers quickly (especially u64/u128) |
homepage | |
repository | https://github.com/gilescope/atoi_radix10 |
max_upload_size | |
id | 415433 |
size | 204,307 |
This repository is the rust port of @KholdStare experimentation in https://kholdstare.github.io/technical/2020/05/26/faster-integer-parsing.html
There is also a blog post on this in Rust Malaysia. https://github.com/rust-malaysia/rust-malaysia.github.io/blob/master/_posts/2020-07-12-faster-integer-parsing.md
From discussions on reddit, it turns out someone else has stumbled on the exact same ideas before @KholdStare, Wojciech Muła. http://0x80.pl/articles/simd-parsing-int-sequences.html
Notes:
Spec:
+/- 0000000000000 then digits.
Goals:
An exploration of the fastest way to parse numbers without reading memory that you don't own. (Once happy with the result we can try and de-unsafe as much as possible while keeping the performance.)
We try to obey the rule of small numbers and make sure single digit numbers are especially fast, and in general all numbers will be parsed faster than std.
If you have to parse u128 and i128 numbers this crate does any number in under 25ns (and if you target a specific cpu with avx then maybe all under 15ns-20ns). It is hands down many many times faster than std rust (especially i128) across all the numbers.
For u8/i8 it's about the same as std.
For u16, u32, u64 it's around 1/3 faster than std.
nightly
and simd
features for highest speed (and target your specific cpu).
std
feature required if you want to run all the tests.
It's no_std
by default and will parse from any [u8]
slice.
This is called SWAR: Simd within a register.
+
further up before it was accessed due to latency requirements.if cond { return a as u16 }
, you can calculate the a as u16
before the if then it's faster. (We're optimising for the happy path).map_err(|_| ())
if you don't care what type of parse error it is.Things that didn't seem to have any effect:
Should I use this in production? As it's a new crate I would treat it with caution,
there could still be a bug or two lurking despite tests and some light fuzzing. It's had one reddit review
from which I've hopefully plugged a soundness hole around alignment.
Why not include this in the std library? The std library parsing code isn't radix specific. As the num parsing is in core, code size is important for embedded systems. This implementation is definitely more code than std.
If you want to run the tests under wasm you need to install:
cargo install wasm-bindgen-cli
cargo test --features nightly --target=wasm32-unknown-unknown
We can run the tests on big-endien via MIRI:
rustup +nightly component add miri
MIRIFLAGS="-Zmiri-symbolic-alignment-check" cargo miri test --target mips64-unknown-linux-gnuabi64