# Summary The RFC proposes a new asynchronous Rust runtime backed by [io-uring] as a new crate: tokio-uring. The API aims to be as close to idiomatic Tokio, but will deviate when necessary to provide full access to io-uring's capabilities. It also will be compatible with existing Tokio libraries. The runtime will use an isolated thread-per-core model, and many types will be `!Send`. [io-uring]: https://kernel.dk/io_uring.pdf?source=techstories.org # Motivation Tokio's current Linux implementation uses non-blocking system calls and epoll for event notification. With epoll, a tuned TCP proxy will spend [70% to 80%][overhead] of CPU cycles outside of userspace, including cycles spent performing syscalls and copying data between the kernel and userspace. In 2019, Linux added a new API, io-uring, which reduces overhead by eliminating most syscalls and mapping memory regions used for byte buffers ahead of time. Early benchmarks comparing io-uring against epoll are promising; a TCP echo client and server implemented in C show up to [60% improvement][bench]. Though not yet measured, using io-uring instead of Tokio's thread-pool strategy will likely provide significant gains for file system operations. Because io-uring differs significantly from epoll, Tokio must provide a new set of APIs to take full advantage of the reduced overhead. However, Tokio's [stability guarantee][stability] means Tokio APIs cannot change until 2024 at the earliest. Additionally, the io-uring API is still evolving with [new functionality][tweet] planned for the near future. Instead of waiting for io-uring to mature and a Tokio 2.0 release, we will release a standalone crate dedicated to exposing an io-uring optimal API. This new crate will be able to iterate rapidly with breaking changes without violating Tokio's stability guarantee. Applications deployed exclusively on Linux kernels 5.10 or later may choose to use this crate when taking full advantage of io-uring's benefits provides measurable benefits. Examples of intended use-cases include TCP proxies, HTTP file servers, and databases. [overhead]: https://www.usenix.org/system/files/conference/nsdi14/nsdi14-paper-jeong.pdf [bench]: https://github.com/frevib/io_uring-echo-server/blob/master/benchmarks/benchmarks.md [stability]: https://tokio.rs/blog/2020-12-tokio-1-0#a-stability-guarantee [tweet]: https://twitter.com/axboe/status/1371978266806919168 # Guide-level explanation The tokio-uring crate provides a module layout similar to the Tokio crate, including modules such as net (TCP, UDP, and Unix domain sockets), fs (file system access), io (standard in, out, and error streams). It also provides a runtime module containing an io-uring specific runtime. Modules such as sync and any other containing async Rust utilities are not included by tokio-uring, in favor of the ones provided by the main Tokio crate. ```toml [dependencies] tokio-uring = "0.1" ``` ```rust fn main() { let rt = tokio_uring::runtime::Runtime::new().unwrap(); rt.block_on(async { // The rest of the application comes here. }); } ``` The application's `main` function starts a tokio-uring runtime and launches its asynchronous logic within that. The tokio-uring runtime can drive both io-uring specific resources (e.g., `TcpStream` and `TcpListener`) and Tokio resources, enabling any library written for Tokio to run within the io-uring specific runtime. ## Submit-based operations Operations on io-uring backed resources return futures, representing the operation completion. The caller awaits the future to get the operation result. ```rust let socket = my_listener.accept().await?; ``` The runtime communicates with the kernel using two single-producer, single-consumer queues. It submits operation requests, such as accepting a TCP socket, to the kernel using the submission queue. The kernel then performs the operation. On completion, the kernel returns the operation results via the completion queue and notifies the process. The `io_uring_enter` syscall flushes the submission queue and acquires any pending completion events. Upon request, this syscall may block the thread waiting for a minimum number of completion events. Both queues use memory shared between the process and the kernel and synchronize with atomic operations. Operation futures provide an asynchronous cancel function, enabling the caller to await on a clean cancellation. ```rust let accept = tcp_listener.accept(); tokio::select! { (tcp_stream, addr) = &mut accept => { ... } _ = tokio::time::sleep(Duration::from_millis(100)) => { // Accept timed out, cancel the in-flight accept. match accept.cancel().await { Ok(_) => { ... } // operation canceled gracefully Err(Completed(tcp_stream, addr)) => { // The operation completed between the timeout // and cancellation request. } Err(_) => { ... } } } } ``` In practice, operation timeouts will deserve a dedicated API to handle the boilerplate. ```rust let (tcp_stream, addr) = tcp_listener .accept() .timeout(Duration::from_millis(100)) .await? ``` The cancel and timeout function are inherent methods on operation future types. If the operation's future drops before the operation completes, the runtime will submit a cancellation request for the operation. However, cancellation is asynchronous and best-effort, the operation may still complete. In that case, the runtime discards the operation result. The queue's single-producer characteristic optimizes for a single thread to own a given submission queue. Supporting a multi-threaded runtime requires either synchronizing pushing onto the submission queue or creating a queue pair per thread. The tokio-uring runtime uses an isolated thread-per-core model, differing from Tokio's other runtimes. Unlike Tokio's primary multi-threaded runtime, there is no work-stealing. Tasks remain on the thread that spawned them for the duration of their lifecycle. Each runtime thread will own a dedicated submission and completion queue pair, and operations are submitted using the submission queue associated with the current thread. Operation completion futures will not implement `Send`, guaranteeing that they remain on the thread to receive the operation result. Interestingly, the resources, e.g., `TcpListener`, can be `Send` as long as they do not hold operation futures internally. It is possible to have two operations in-flight from a single resource associated with different queues and threads. Because operations are not `Send`, tasks awaiting these operations are also not `Send`, making them unable to be spawned using the spawn function from the Tokio crate. The tokio-uring crate will provide a spawn function that accepts not `Send` tasks. When using multiple isolated runtime threads, balancing load between them becomes challenging. Applications must take care to ensure load remains balanced across threads, and strategies tend to vary. For example, a TCP server can distribute accepted connections across multiple threads, ideally while maintaining equal load across threads. One approach is to submit "accept" operations for the same listener concurrently across all the runtime threads while ensuring overloaded workers back-off. Defining runtime load is out of this crate's scope and left to the application, though pseudocode follows. ```rust let listener = Arc::new(TcpListener::bind(...)); spawn_on_each_thread(async { loop { // Wait for this worker to have capacity. This // ensures there are a minimum number of workers // in the runtime that are flagged as with capacity // to avoid total starvation. current_worker::wait_for_capacity().await; let socket = listener.accept().await; spawn(async move { ... }); } }) ``` ## Reading and writing Read and write operations require passing ownership of buffers to the kernel. When the operation completes, the kernel returns ownership of the buffer to the caller. The caller is responsible for allocating the buffer's memory and ensuring it remains alive until the operation completes. Additionally, while the kernel owns the memory, the process may not read from or write to the buffer. By designing the Rust APIs using ownership passing, Rust enforces the requirements at compile time. The following example demonstrates reading and writing with a file resource. ```rust use tokio_uring::buf; /// The result of an operation that includes a buffer. The buffer must /// be returned to the caller when the operation completes successfully /// or fails. /// /// This is implemented as a new type to implement std::ops::Try once /// the trait is stabilized. type BufResult = (std::io::Result, buf::Slice); // Positional read and write function definitions impl File { async fn read_at(&self, buf: buf::Slice, pos: u64) -> BufResult; async fn write_at(&self, buf: buf::Slice, pos: u64) -> BufResult; } /// The caller allocates a buffer let buf = buf::IoBuf::with_capacity(4096); // Read the first 1024 bytes of the file, when `std::ops::Try` is // stable, the `?` can be applied directly on the `BufResult` type. let BufResult(res, buf) = file.read_at(0, buf.slice(0..1024)).await; // Check the result. res?; // Write some data back to the file. let BufResult(res, _) = file.write_at(1024, buf.slice(0..512)).await; res?; ``` When reading from a TCP stream, read operations remain in flight until the socket receives data, an exchange that can take an arbitrary amount of time. Suppose each read operation requires pre-allocating memory for the in-flight operation. In that case, the amount of memory consumed by the process grows linearly with the number of in-flight read operations. For applications with a large number of open connections, this can be problematic. The io-uring API supports registering buffer pools with the ring and configuring read operations to use the buffer pool instead of a dedicated per-operation buffer. When the socket receives data, the kernel checks out a buffer from the pool and returns it to the caller. After reading the data, the caller returns the buffer to the kernel. If no buffers are available, the operation pauses until the process returns a buffer to the kernel, at which time the operation completes. ```rust // ... let buf = file.read_at_prepared(0, 1024, DefaultPool).await?; // Write some data back to the file. let BufResult(res, buf) = file.write_at(1024, buf.slice(0..512)).await; res?; // Dropping the buffer lets the kernel know the buffer may // be reused. drop(buf); ``` The runtime supports initializing multiple pools containing buffers of different sizes. When submitting a read operation, the caller specifies the pool from which to draw a buffer. ```rust // Allocate a buffer pool let my_pool = BufferPool::builder() .buffer_size(16_384) .num_buffers(256) .build(); // Create the runtime let mut rt = tokio_uring::runtime::Runtime::new()?; // Provide the buffer pool to the kernel. This passes // ownership of the pool to the kernel. let pool_token = rt.provide_buffers(my_pool)?; rt.block_on(async { // ... let buf = file.read_at_prepared(0, 1024, pool_token).await?; }); ``` ## Buffer management Buffers passed to read and write operations must remain alive and pinned to a memory location while operations are in-flight, ruling out `&mut [u8]` as an option. Additionally, read and write operations may reference buffers using a pointer or, when the buffer is part of a pre-registered buffer pool, using a numeric buffer identifier. The tokio-uring crate will provide its buffer type, `IoBuf`, to use when reading and writing. ```rust pub struct IoBuf { kind: Kind, } enum Kind { /// A vector-backed buffer Vec(Vec), /// Buffer pool backed buffer. The pool is managed by io-uring. Provided(ProvidedBuf), } ``` Internally, the IoBuf type is either backed by individually heap-allocated memory or a buffer pool entry. Acquiring an `IoBuf` is done via `IoBuf::with_capacity` or checking-out an entry from a pool. ```rust // Individually heap-allocated let my_buf = IoBuf::with_capacity(4096); // Checked-out from a pool match my_buffer_pool.checkout() { Ok(io_buf) => ..., Err(e) => panic!("buffer pool empty"), } ``` On drop, if the `IoBuf` is a buffer pool member, it is checked back in. If the kernel initially checked out the buffer as part of a read operation, an io-uring operation is issued to return it. Submitting the io-uring operation requires the buffer to remain on the same thread that checked it out and is enforced by making the `IoBuf` type `!Send`. Buffer pools will also be `!Send` as they contain `IoBuf` values. `IoBuf` provides an owned slice API allowing the caller to read to and write from a buffer's sub-ranges. ```rust pub struct Slice { buf: IoBuf, begin: usize, end: usize, } impl IoBuf { fn slice(self, range: impl ops::RangeBounds) -> Slice { .. } } // Write a sub-slice of a buffer my_file.write(my_io_buf.slice(10..20)).await ``` A slice end may go past the buffer's length but not past the capacity, enabling reads to uninitialized memory. ```rust // The buffer's memory is uninitialized let mut buf = IoBuf::with_capacity(4096); let slice = buf.slice(0..100); assert_eq!(slice.len(), 0); assert_eq!(slice.capacity(), 100); // Read data from a file into the buffer let BufResult(res, slice) = my_file.read_at(slice, 0); assert_eq!(slice.len(), 100); assert_eq!(slice.capacity(), 100); ``` A trait argument for reading and writing may be possible as a future improvement. Consider a read API that takes `T: AsMut<[u8]>` for the buffer argument. ```rust async fn read>(&self, dst: T) { ... } struct MyUnstableBuf { mem: [u8; 10], } impl AsMut<[u8]> for MyUnstableBuf { fn as_mut(&mut self) -> &mut [u8] { &mut self.mem[..] } } ``` This read function takes ownership of the buffer; however, any pointer to the buffer obtained from a value becomes invalid when the value moves. Storing the buffer value at a stable location while the operation is in-flight should be sufficient to satisfy safety. ## Closing resources Idiomatically with Rust, closing a resource is performed in the drop handler, and within an asynchronous context, the drop handler should be non-blocking. Closing an io-uring resource requires canceling any in-flight operations, which is an asynchronous process. Consider an open TcpStream associated with file-descriptor (FD) 10. A task submits a read operation to the kernel, but the `TcpStream` is dropped and closed before the kernel sees it. A new `TcpStream` is accepted, and the kernel reuses FD 10. At this point, the kernel sees the original read operation request with FD 10 and completes it on the new `TcpStream`, not the intended `TcpStream`. The runtime receives the completion of the read operation. It discards the result because the associated operation future is gone, resulting in the caller losing data from the second TcpStream. This problem occurs even issuing a cancellation request for the read operation. There is no guarantee the kernel will see the cancellation request before completing the operation. There are two options for respecting both requirements, neither ideal: closing the resource in the background or blocking the thread in the drop handler. If the resource is closed in the background, the process may encounter unexpected errors, such as "too many open files." Blocking the thread to cancel in-flight operations and close the resource prevents the runtime from processing other tasks and adds latency across the system. Instead, tokio-uring will provide an explicit asynchronous close function on resource types. ```rust impl TcpStream { async fn close(self) { ... } } my_tcp_stream.close().await; ``` The resource must still tolerate the caller dropping it without being explicitly closed. In this case, tokio-uring will close the resource in the background, avoiding blocking the runtime. The drop handler will move ownership of the resource handle to the runtime and submit cancellation requests for any in-flight operation. Once all existing in-flight operations complete, the runtime will submit a close operation. If the drop handler must process closing a resource in the background, it will notify the developer by emitting a warning message using [`tracing`]. In the future, it may be possible for Rust to provide a `#[must_not_drop]` attribute. This attribute will result in compilation warnings if the developer drops a resource without using the explicit close method. [`tracing`]: https://github.com/tokio-rs/tracing ## Byte streams Byte stream types, such as `TcpStream`, will not provide read and write methods (see "Alternatives" below for reasoning). Instead, byte streams will manage their buffers internally, as described in ["Notes on io-uring"][notes], and implement buffered I/O traits, such as AsyncBufRead. The caller starts by waiting for the byte stream to fill its internal buffer, reads the data, and marks the data as consumed. [notes]: https://without.boats/blog/io-uring/ ```rust // `fill_buf()` is provided by `AsyncBufRead` let data: &[u8] = my_stream.fill_buf().await?; println!("Got {} bytes", data.len()); // Consume the data my_stream.consume(data.len()); ``` Internally, byte streams submit read operations using the default buffer pool. Additional methods exist to take and place buffers, supporting zero-copy piping between two byte streams. ```rust my_tcp_stream.fill_buf().await?; let buf: IoBuf = my_tcp_stream.take_read_buf(); // Mutate `buf` if needed here. my_other_stream.place_write_buf(buf); my_other_stream.flush().await?; ``` Implementing buffer management on the `TcpStream` type requires tracking the in-flight read and write operations, making the `TcpStream` type `!Send`. Sending a `TcpStream` across threads is doable by first converting the io-uring `TcpStream` to a standard library `TcpStream`, sending that value to a new thread, and converting it back to an io-uring `TcpStream`. Unlike the standard library, the `File` type does not expose a byte stream directly. Instead, the caller requests a read or write stream, making it possible to support multiple concurrent streams. Each file stream maintains its cursor and issues positional read and write operations based on the cursor. ```rust let read_stream = my_file.read_stream(); let write_stream = my_file.write_stream(); read_stream.fill_buf().await? let buf: IoBuf = read_stream.take_read_buf(); write_stream.place_write_buf(buf); write_stream.flush().await?; // Because `read_stream` and `write_stream` maintain separate // cursors, `my_file` is unchanged at the end of this example. ``` Byte streams may have a configurable number of concurrent in-flight operations. Achieving maximum throughput [requires configuring][modern-storage] this value to take advantage of the underlying hardware's characteristics. [modern-storage]: https://itnext.io/modern-storage-is-plenty-fast-it-is-the-apis-that-are-bad-6a68319fbc1a ## Traits The tokio-uring crate will not expose any traits. The crate does not aim to be a point of abstraction for submission-based I/O models. Instead, to provide compatibility with the existing Rust asynchronous I/O ecosystem, byte stream types, such as `TcpStream`, will implement Tokio's [AsyncRead], [AsyncWrite], and [AsyncBufRead] traits. Using these traits requires an additional copy between the caller's buffer and the byte stream's internal buffer compared to taking and placing buffers. [AsyncRead]: https://docs.rs/tokio/1/tokio/io/trait.AsyncRead.html [AsyncWrite]: https://docs.rs/tokio/1/tokio/io/trait.AsyncWrite.html [AsyncBufRead]: https://docs.rs/tokio/1/tokio/io/trait.AsyncBufRead.html # Implementation details Creating the tokio-uring runtime initializes the io-uring submission and completion queues and a Tokio current-thread epoll-based runtime. Instead of waiting on completion events by blocking the thread on the io-uring completion queue, the tokio-uring runtime registers the completion queue with the epoll handle. By building tokio-uring on top of Tokio's runtime, existing Tokio ecosystem crates can work with the tokio-uring runtime. When the kernel pushes a completion event onto the completion queue, "epoll_wait" unblocks and returns a readiness event. The Tokio current-thread runtime then polls the io-uring driver task, draining the completion queue and notifying completion futures. Like Tokio, using an I/O type does not require explicitly referencing the runtime. Operations access the current runtime via a thread-local variable. Requiring a handle would be intrusive as the application must pass the handle throughout the code. Additionally, an explicit handle would require a pointer-sized struct, causing binary bloat. The disadvantage with such an approach is that there is no way to guarantee operation submission happens within the runtime context at compile time. Attempting to use a tokio-uring resource from outside of the runtime will result in a panic. ```rust use tokio_uring::runtime::Runtime; use tokio_uring::net::TcpListener; fn main() { // Binding a TcpListener does not require access to the runtime. let listener = TcpListener::bind("0.0.0.0:1234".parse().unwrap()); let rt = Runtime::new().unwrap(); rt.block_on(async { // This works, as `block_on` sets the thread-local variable. let _ = listener.accept().await; }); // BOOM: panics because called outside of the runtime futures::future::block_on(async { let _ = listener.accept().await; }); } ``` ## Operation state Most io-uring operations reference resources, such as buffers and file descriptors, for the kernel to use. These resources must remain available while the operation is in-flight. Any memory referenced by pointers must remain allocated, and the process must not access the memory. Because asynchronous Rust allows dropping futures at any time, the operation futures may not own data referenced by the in-flight operation. The tokio-uring runtime will take ownership and store resources referenced by operations while they are in-flight. ```rust struct IoUringDriver { // Storage for state referenced by in-flight operations in_flight_operations: Slab, // The io-uring submission and completion queues. queues: IoUringQueues, } struct Operation { // Resources referenced by the kernel, this must stay // available until the operation completes. state: State, lifecycle: Lifecycle, } enum State { // An in-flight read operation, `None` when reading into // a kernel-owned buffer pool Read { buffer: Option }, Write { buffer: buf::Slice }, // Accept a TCP socket Accept { ... } Close { ... } // ... other operations } enum Lifecycle { /// The operation has been submitted to uring and is currently in-flight Submitted, /// The submitter is waiting for the completion of the operation Waiting(Waker), /// The submitter no longer has interest in the operation result. Ignored, /// The operation has completed. The completion result is stored. Completed(Completion), } // Completion result returned from the kernel via the completion queue struct Completion { result: io::Result, flags: u32, } ``` The `Operation` struct holds any data referenced by the operation submitted to the kernel, preventing the data from being dropped early. The lifecycle field tracks if the operation is in-flight, has completed, or if the associated future has dropped. The lifecycle field also passes operation results from the driver to the future via the `Completed` variant. When a task starts a read operation, the runtime allocates an entry in the in-flight operation store, storing the buffer and initializing the lifecycle to `Submitted`. The runtime then pushes the operation to the submission queue but does not synchronize it with the kernel. Synchronization happens once the task yields to the runtime, enabling the task to submit multiple operations without synchronizing each one. Delaying synchronization can add a small amount of latency. While not enforced, tasks should execute for no more than 500 microseconds before yielding. When the runtime receives completion results, it must complete the associated operation future. The runtime loads the in-flight operation state, stores the result, transitions the lifecycle to `Completed`, and notifies the waker. The next time the caller's task executes, it polls the operation's future which completes and returns the stored result. If the operation's future drops before the operation completes and the operation request is still in the submission queue, the drop function removes the request. Otherwise, it sets the lifecycle to `Ignored` and submits a cancellation request to the kernel. The cancellation request will attempt to terminate the operation, causing it to complete immediately with an error. Cancellation is best-effort; the operation may or may not terminate early. If the operation does complete, the runtime discards the result. The runtime maintains the internal operation state until the completion as this state owns data the kernel may be referencing. ## Read operations Depending on the flavor, read operation can have multiple io-uring operation representations. The io-uring API provides two different read operation codes: `IORING_OP_READ` and `IORING_OP_READ_FIXED`. The first accepts a buffer as a pointer, and the second takes a buffer as an identifier referencing a buffer pool and entry. Additionally, the `IORING_OP_READ` operation can accept a null buffer pointer, indicating that io-uring should pick a buffer from a provided buffer pool. The tokio-uring runtime will determine which opcode to use based on the kind of `IoBuf` kind it receives. The `Vec` kind maps to the `IORING_OP_READ` opcode, and the `Provided` maps to `IORING_OP_READ_FIXED`. ## Prior art [Glommio] is an existing asynchronous Rust runtime built on top of io-uring. This proposal draws heavily from ideas presented there, tweaking concepts to line up with Tokio's idioms. [`@withoutboats`][boats] has explored the space with [ringbahn]. The tokio-uring crate is built on the pure-rust [io-uring] crate, authored by [@quininer]. This crate provides a low-level interface to the io-uring syscalls. [Glommio]: https://github.com/DataDog/glommio [boats]: https://github.com/withoutboats [ringbahn]: https://github.com/ringbahn/ringbahn [@quininer]: https://github.com/quininer/ [io-uring]: https://github.com/tokio-rs/io-uring/ ## Future work The main Tokio crate will likely adopts concepts from tokio-uring in the future. The most obvious area is Tokio's file system API, currently implemented using a thread-pool. The current API would remain, and the implementation would use io-uring when supported by the operating system. The tokio-uring APIs may form the basis for a Tokio 2.0 release, though this cannot happen until 2024 at the earliest. As an intermittent step, the tokio-uring crate could explore supporting alternate backends, such as epoll, kqueue, and iocp. The focus will always remain on io-uring. The current design does not cover registering file descriptors with the kernel, which improves file system access performance. After registering a file descriptor with io-uring, it must not move to a different thread, implying `!Send`. Because many resource types are Send, a likely path for supporting the feature is adding new types to represent the registered state. For example, `File` could have a `RegisteredFile` analog. Making this change would be forwards compatible and not impact the current design. ## Alternatives ### Use a work-stealing scheduler An alternative approach could use a work-stealing scheduler, allowing underutilized worker threads to steal work from overloaded workers. While work-stealing is an efficient strategy for balancing load across worker threads, required synchronization adds overhead. [An earlier article] on the Tokio blog includes an overview of various scheduling strategies. The tokio-uring crate targets use-cases that can benefit from taking advantage of io-uring at the expense of discarding Tokio's portable API. These use cases will also benefit from reduced synchronization overhead and fine-grained control over thread load balancing strategies. [An earlier article]: https://tokio.rs/blog/2019-10-scheduler ### Read and write methods on TcpStream Read and write operations on byte stream types, such as `TcpStream`, are stateful. Each operation operates on the next chunk of the stream, advancing a logical cursor. The combination of submit-based operations with drop-to-cancel semantics presents a challenge. Consider the following. ```rust let buf = IoBuf::with_capacity(4096); // This read will be canceled. select! { _ = my_tcp_stream.read(buf.slice(..)) => unreachable!(), _ = future::ready(()) => {} } let buf = IoBuf::with_capacity(4096); let (res, buf) = my_tcp_stream.read(buf.slice(..)).await; ``` Dropping the first read operation cancels it, and because it never completes, the second read operation should read the first packet of data from the TCP stream. However, the kernel may have already completed the first read operation before seeing the cancellation request. A naive runtime implementation would drop the completion result of any canceled operation, which would result in losing data. Instead, the runtime could preserve the data from the first read operation and return it as part of the second read. The process would not lose data, but the runtime would need to perform an extra copy or return the caller a different buffer than the one it submitted. The proposed API is not vulnerable to this issue as resources track their operations, preventing them from being dropped as long as the resource is open. ### Expose a raw io-uring operation submission API The proposed tokio-uring API does not include a strategy for the user to submit custom io-uring operations. Any raw API would be unsafe or would require the runtime to support taking opaque data as a trait object. Given that io-uring has a well-defined set of supported options, tokio-uring opts to support each operation explicitly. The application also can create its own set of io-uring queues using the io-uring crate directly. ### Load-balancing spawn function The tokio-uring crate omits a spawn function that balances tasks across threads, leaving this to future work. Instead, the application should manage its load balancing. While there are a few common balancing strategies, such as round-robin, randomized, and power of two choices, there is no one-size-fits-all strategy. Additionally, some load-balancing methods require application-specific metrics such as worker load. Additionally, consider pseudocode representing a typical accept loop pattern. ```rust loop { let (socket, _) = listener.accept().await?; spawn(async move { let request = read_request(&socket).await; let response = process(request).await; write_response(response, &socket).await; }); } ``` Often, the request data is already buffered when the socket is accepted. If the spawn call results in the task moving to a different worker thread, it will delay reading the request due to cross-thread synchronization. Additionally, it may be possible to batch an accept operation with an operation that reads from the accepted socket in the future, complicating moving the accepted socket to a new thread. The details of such a strategy are still not finalized but may impact a load-balancing spawn function.