Crates.io | paralight |
lib.rs | paralight |
version | 0.0.3 |
source | src |
created_at | 2024-09-17 09:20:09.527728 |
updated_at | 2024-10-22 09:00:47.194012 |
description | A lightweight parallelism library for indexed structures |
homepage | |
repository | https://github.com/gendx/paralight |
max_upload_size | |
id | 1377262 |
size | 184,824 |
This library allows you to distribute computation over slices among multiple threads. Each thread processes a subset of the items, and a final step reduces the outputs from all threads into a single result.
use paralight::iter::{IntoParallelIterator, ParallelIteratorExt};
use paralight::{CpuPinningPolicy, RangeStrategy, ThreadCount, ThreadPoolBuilder};
// Define thread pool parameters.
let pool_builder = ThreadPoolBuilder {
num_threads: ThreadCount::AvailableParallelism,
range_strategy: RangeStrategy::WorkStealing,
cpu_pinning: CpuPinningPolicy::No,
};
// Create a scoped thread pool.
let sum = pool_builder.scope(
|mut thread_pool| {
// Compute the sum of a slice.
let input = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10];
input
.par_iter(&mut thread_pool)
.copied()
.reduce(|| 0, |x, y| x + y)
},
);
assert_eq!(sum, 5 * 11);
Note: In principle, Paralight could be extended to support other inputs than slices as long as they are indexed, but for now only slices are supported. Come back to check when future versions are published!
The ThreadPoolBuilder
provides an explicit way to
configure your thread pool, giving you fine-grained control over performance for
your workload. There is no default, which is deliberate because the suitable
parameters depend on your workload.
Paralight allows you to specify the number of worker threads to spawn in a
thread pool with the ThreadCount
enum:
AvailableParallelism
uses the number of
threads returned by the standard library's
available_parallelism()
function,Count(_)
uses the specified number of threads, which
must be non-zero.For convenience, ThreadCount
implements the
TryFrom<usize>
trait to create a Count(_)
instance, validating that the given number of threads is not zero.
Paralight offers two strategies in the RangeStrategy
enum to
distribute computation among threads:
Fixed
splits the input evenly and hands out a fixed
sequential range of items to each worker thread,WorkStealing
starts with the fixed
distribution, but lets each worker thread steal items from others once it is
done processing its items.Note: In work-stealing mode, each thread processes an arbitrary subset of items
in arbitrary order, meaning that the reduction operation must be both
commutative and
associative to yield a
deterministic result (in contrast to the standard library's
Iterator
trait that processes items in order).
Fortunately, a lot of common operations are commutative and associative, but be
mindful of this.
Recommendation: If your pipeline is performing roughly the same amont of work
for each item, you should probably use the Fixed
strategy, to avoid paying the synchronization cost of work-stealing. This is
especially true if the amount of work per item is small (e.g. some simple
arithmetic operations). If the amoung of work per item is highly variable and/or
large, you should probably use the WorkStealing
strategy (e.g. parsing strings, processing files).
Paralight allows pinning each worker thread to one CPU, on platforms that
support it. For now, this is implemented for platforms whose
target_os
is among android
, dragonfly
, freebsd
and linux
(platforms that support
libc::sched_setaffinity()
via the
nix
crate).
Paralight offers three policies in the CpuPinningPolicy
enum:
No
doesn't pin worker threads to CPUs,IfSupported
attempts to pin each worker
thread to a distinct CPU on supported platforms, but proceeds without pinning
if running on an unsupported platform or if the pinning function fails,Always
pins each worker thread to a distinct
CPU, panicking if the platform isn't supported or if the pinning function
returns an error.Whether CPU pinning is useful or detrimental depends on your workload. If you're
processing the same data over and over again (e.g. calling par_iter()
multiple
times on the same data), CPU pinning can help ensure that each subset of the
data is always processed on the same CPU core and stays fresh in the lower-level
per-core caches, speeding up memory accesses. This however depends on the amount
of data: if it's too large, it may not fit in per-core caches anyway.
If your program is not running alone on your machine but is competing with other
programs, CPU pinning may be detrimental, as a worker thread will be blocked
whenever its required core is used by another program, even if another core is
free and other worker threads are done (especially with the
Fixed
strategy). This of course depends on how the
scheduler works on your OS.
With the WorkStealing
strategy, inputs with
more than u32::MAX
elements are currently not supported.
use paralight::iter::{IntoParallelIterator, ParallelIteratorExt};
use paralight::{CpuPinningPolicy, RangeStrategy, ThreadCount, ThreadPoolBuilder};
let pool_builder = ThreadPoolBuilder {
num_threads: ThreadCount::AvailableParallelism,
range_strategy: RangeStrategy::WorkStealing,
cpu_pinning: CpuPinningPolicy::No,
};
let sum = pool_builder.scope(
|mut thread_pool| {
let input = vec![0u8; 5_000_000_000];
input
.par_iter(&mut thread_pool)
.copied()
.reduce(|| 0, |x, y| x + y)
},
);
assert_eq!(sum, 0);
Two optional features are available if you want to debug performance.
log
, based on the log
crate prints basic
information about inter-thread synchronization: thread creation/shutdown, when
each thread starts/finishes a computation, etc.log_parallelism
prints detailed traces about which items are processed by
which thread, and work-stealing statistics (e.g. how many times work was
stolen among threads).Note that in any case neither the input items nor the resulting computation are
logged. Only the indices of the items in the input may be present in the logs.
If you're concerned that these indices leak too much information about your
data, you need to make sure that you depend on Paralight with the log
and
log_parallelism
features disabled.
This is not an officially supported Google product.
See CONTRIBUTING.md
for details.
This software is distributed under the terms of both the MIT license and the Apache License (Version 2.0).
See LICENSE
for details.