refset

Crates.io	refset
lib.rs	refset
version	0.2.0
created_at	2020-10-19 16:30:44.129041+00
updated_at	2020-10-22 01:54:32.255609+00
description	A non-owning HashSet
homepage
repository	https://git.flanchan.moe/flanchan/refset
max_upload_size
id	302965
size	25,434

Avril (notflan)

documentation

README

refset - non-owning `HashSet`

A hash-set analogue that does not own its data.

It can be used to "mark" items without the need to transfer ownership to the map

Example use case

/// Process arguments while ignoring duplicates
fn process_args(args: impl IntoIterator<Item=String>) {
  let mut same=  HashRefSet::new();
  for argument in args.into_iter()
  {
    if !same.insert(argument.as_str()) {
      // Already processed this input, ignore
      continue;
    }
    //do work...
  }
}

Serialisation support with `serde` crate

HashRefSet and HashType both implement Serialize and Deserialize from the serde crate if the serde feature is enabled. By default it is not.

Hashing

We use the SHA512 hashing algorithm for the implementation at present. I may implement the ability to choose different types, but as of now I think it is sufficient.

Drawbacks

Since the item is not inserted itself, we cannot use Eq to double check there was not a hash collision. While the hashing algorithm used (Sha512) is extremely unlikely to produce collisions, especially for small data types, keep in mind that it is not infallible.

Speed

HashRefSet is significantly slower than HashSet, so HashSet should be preferred in most cases. Even when Clone is required to insert into HashSet, it can be ~10x faster for trivial data structures. HashRefSet should be used if Clone is either not an option, or Clone is a significantly heavy operation on the type you're inserting.

Benchmark	Tests	Result
owning_strings	Inserts `String` into `HashSet` by cloning	~4,538 ns/iter
non_owning_strings	Inserts `str` into `HashRefSet` by reference	~48,271 ns/iter
owning_ints	Inserts `u32` into `HashSet` by copy	~937 ns/iter
non_owning_ints	Inserts `&u32` into `HashRefSet` by reference	~31,089 ns/iter

When to use over `HashSet`

The type you're inserting needs to be both in the set and moved elsewhere. (see exmaple)
Simply using Clone to insert a copy of the item into a HashSet is not possible (non-Clone type) or is a significantly heavy operation. (see benchmarks)
The fallibility of potential (albeing extremely unlikely) collisions of the SHA512 algorithm is not a concern
You need to insert an unsized type into a HashSet

Smallmap implementation

With the smallmap feature enabled, the small module also provides the same API as HashRefSet via SmallRefMap. It is backed by smallmap::Map instead of HashSet, which could potentially have some performance or memory usage impacts, or not.

The hashing algorithm and usage is otherwise identical for now, but this may change.

Benchmarks of `SmallRefMap`

Comparing with cloning or copying into smallmap::Map.

Largely there are the same performance penalties as the above table, with very minor differences.

Benchmark	Tests	Result
owning_strings	Inserts `String` into `SmallMap` by cloning	~3,096 ns/iter
non_owning_strings	Inserts `str` into `SmallRefMap` by reference	~47,302 ns/iter
owning_ints	Inserts `u32` into `SmallMap` by copy	~316 ns/iter
non_owning_ints	Inserts `&u32` into `SmallRefMap` by reference	~30,046 ns/iter

Each page of the SmallRefMap will consume at least 16kb of memory however. This may not be very desireable, but is still an available feature.

License

MIT

Commit count: 0