EScp 0.8 ======== EScp is an application for high speed data transfer. This version has been tested across the WAN at ~200 gbit/s at SC23, disk-to-disk, limited by NIC. It is designed to be a modern transport protocol replicating the functionality of SCP. Some features include: * Multiple TCP/IP streams per transfer ( 1/thread ) * Lockless multi threaded application * ZeroCopy (When blocksize < L3), uses shared memory, and page aligned * AES-GCM 128 over the wire encryption, optimized for performance * Authentication to remote systems through SSH * Swappable I/O Engines ( Currently: DUMMY & POSIX ) * Block based interface to transferring data (API expected in 0.9) * Compression (using zstd) * Checksums, Direct I/O, API's. In general, the approach taken by EScp is to have algorithms optimized to give good performance across all metrics (i.e. billions of small files, single large file), that scale linearly with core count. EScp isn't intended to replace SCP; Instead it is meant to fill a void in efficiently transferring science data. That means supporting different use cases, such as block based data transfer instead of file based data, supporting the types of files used in science, such as sparse files, and, importantly, being able to transfer at any scale, securely, efficiently, and with commodity hardware. EScp is also intended to fit into existing science frameworks, and is therefore meant to be called through an API. If these features interest you, please reach out. EScp is currently only available for x86 on Linux. For the most part, this isn't a case of it not being able to support other platforms, but more a reflection of the type of servers that are typically used for science. If you have a use case for something that isn't supported, please create a GitHub issue. We also accept Pull requests if you happen to have a patch for us. RELEASE NOTES ============= EScp 0.8 adds feature parity with most SCP flags, improves session handling, and fixes a number of edge cases around things like directory handling, zero byte files, and so on. Overall the result should be further improvements to stability, performance, and ease of use. **WARNING:** This software is in development. If you are using a *tagged* release, for instance, 0.8.0, it has passed a number of tests to verify that the application works as the developer expects. If you are using the latest git version expect that things will break. In any event, while EScp has successfully transferred PBs of data and hundreds of millions of files. It may fail for you! If you run into a case where EScp is failing, but SCP is not, please create a bug report. At this point in the EScp software life-cycle, it should be transferring files safely and successfully, although there are still a few differences. For instance, EScp won't transfer symlinks and it only preserves attributes on files. If you found this software useful and/or have any questions/requests, please reach out. The primary author, see AUTHOR, can be reached using the first letter of the first name followed by the full lastname @ ESnet domain. Bugs/Issues are appreciated and should be reported using the Github Issues feature. How EScp is different from SCP ============================== After initiating a transfer with EScp, system SSH is invoked to connect to the remote host, however, unlike SCP, it uses SSH only to spawn a receiver and transfer session keys. Once complete, EScp connects to the EScp service on the remote host, typically using TCP port 1232-42 (choosing the first open port). Once connected it encrypts all communications using AES-GCM-128 and the session keys from earlier. Because EScp implements its own encryption protocol, it is able to optimize this to be fast, as well as configure the TCP ports for long haul data transmission. This enables EScp to be orders of magnitude faster than SCP. All transfers are encrypted, and this behavior is not possible to disable. By default, the sender computes a checksum when reading the file. The receiver computes a checksum when writing the file and sends that checksum to the sender, which the sender then verifies. While the checksum is not cryptographic in nature, between network encryption and file check summing, the protocol is thought to securely and reliably copy data. As EScp is a performance oriented transfer tool, all files default to `O_DIRECT` and then switch to indirect mode if that fails on a target/source filesystem. In rare cases, `O_DIRECT` can actually make performance worse and you may need to disable this feature. EScp uses multiple I/O threads with one TCP stream per thread. You may specify the number of I/O threads via the CLI. There are also threads that iterate directories/files, this is so that EScp can take advantage of parallelism available in your file system and on your CPU. This allows EScp to scale transfer performance with your CPU, as a single thread/core is limited in how much data it can process. Logging on EScp is different from SCP. Please see DEBUGGING for details on logging. Additionally, as this software is still in-development, not all error messages bubble-up correctly and you may need to enable logging to help to understand why something is not working. EScp does support a conf-file for some options. The conf file, `/etc/escp.conf`, is a YAML file. An example of some of the things you can set is shown below: ``` cpumask: FFFF nodemask: 1 ``` USAGE ===== ``` Energy Sciences Network transfer tool (EScp) Usage: escp [OPTIONS] ... Arguments: ... Source Files/Path Destination host: [default: ] Options: -P, --port SSH Port --escp_port ESCP Port [default: 1232] -v, --verbose Verbose/Debug output -q, --quiet -A, --agent Enable SSH Agent Forwarding -c, --cipher CIPHER used by SSH -i, --identity IDENTITY pubkey for SSH auth -l, --limit LIMIT/thread (bytes/sec) using SO_MAX_PACING_RATE -p, --preserve Preserve source attributes (TODO) -C, --compress Enable Compression --sparse Sparse file support, use with compression -r, --recursive Copy recursively -o SSH_OPTION to SSH -S, --ssh SSH binary [default: ssh] -D, --escp EScp binary [default: escp] --blocksize [default: 1M] --ioengine posix,dummy [default: posix] -t, --parallel # of EScp parallel threads [default: 4] --bits Display speed in bits/s --nodirect Don't enable direct mode --nochecksum Don't enable file checksum -L, --license Display License -h, --help Print help -V, --version Print version Example: # Transfer file1 and file2 to server host using SSH escp file1 file2 host:/remoteDirectory ``` INSTALL ======= The recommended approach to using EScp is by compiling it yourself and then to use the resultant RPM/DEB file to install on your systems. COMPILING ========= ``` # Install system dependencies (Debian) apt install cmake libtool g++ libnuma-dev nasm autoconf automake \ curl # for get rust stanza \ libclang-dev # for bindgen # Install system dependecies (RHEL Family) sudo dnf group install "Development Tools" dnf install epel-release dnf install nasm autoconf automake libtool cmake curl # Get rust curl https://sh.rustup.rs -sSf | sh . "$HOME/.cargo/env" # Build escp ./mk.sh # If mk.sh fails because you had missing dependencies, # remove the build directory and then re-run it. If that # still fails, cat the file and attempt to run the # build process for libdtn manually until you have found # the error. # You now need to install escp, the suggested path is to create an RPM/DEB cargo install cargo-deb cargo deb # or cargo install cargo-rpm cargo rpm init cargo rpm build # Then install as (Update the version as appropriate): sudo dpkg -i target/debian/escp_0.7.0_amd64.deb # Debian # or dnf install target/release/rpmbuild/RPMS/x86_64/escp-0.8.0*.rpm # Redhat Family # For development cargo install bindgen-cli --version 0.68.1 bindgen libdtn/include/dtn.h -o src/escp/bindings.rs --use-core --generate-cstr # flatc version 23.5.26 # You will need to grab the tagged version in GIT and compile it. # You probably also want gdb/valgrind/whatever your favorite debug tools are # You can enable autocomplete by adding below to .bashrc complete -F _scp -o nospace escp _completion_loader scp ``` KNOWNBUGS ========= * Error messages are inconsistent. If you find that the sender is giving you an error message you can't parse, check both the client and server log. - /tmp/escp.log.sender - /tmp/escp.log.receiver * Dummy engine can only be used with a few files * You can't disable encryption * Check https://github.com/ESnet/EScp/issues TUNING ====== For the most part, performance should be good without any tuning, however, EScp does allow for tuning most parameters associated with a transfer to optimize your data transfer workflow. First, check to see where you bottleneck is. Generally speaking you want to figure out, are you storage bound, network bound, or CPU bound. To test the network, execute something like this: ``` # Create dummy 1TB file dd if=/dev/zero of=1T bs=1 count=1 seek=1T # Transfer it using dummy engine escp 1T remote-host: --ioengine=dummy ``` The transfer should quickly settle at some number. If this is faster than your transfer from disk, congratulations, you are disk bound. You probably need to increase the block size until it is bigger than your RAID size (if using RAID). You can query your block size with: ``` # mdadm --detail /dev/md127 /dev/md127: Version : 1.2 Creation Time : Mon Nov 14 17:57:39 2022 Raid Level : raid10 Array Size : 7813406720 (7.28 TiB 8.00 TB) Used Dev Size : 1562681344 (1490.29 GiB 1600.19 GB) Raid Devices : 10 Total Devices : 10 Persistence : Superblock is persistent Intent Bitmap : Internal Update Time : Tue Jul 2 23:17:28 2024 State : clean, degraded, recovering Active Devices : 8 Working Devices : 10 Failed Devices : 0 Spare Devices : 2 Layout : near=2 Chunk Size : 512K . . . # ^^^ In this example, our block size (default 1M, would need to be >= 512K) # xfs_info /storage meta-data=/dev/md127 isize=512 agcount=32, agsize=61042304 blks = sectsz=512 attr=2, projid32bit=1 = crc=1 finobt=1, sparse=1, rmapbt=0 = reflink=1 bigtime=0 inobtcount=0 data = bsize=4096 blocks=1953351680, imaxpct=5 = sunit=128 swidth=640 blks naming =version 2 bsize=4096 ascii-ci=0, ftype=1 log =internal log bsize=4096 blocks=521728, version=2 = sectsz=512 sunit=8 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=0 # But the filesystem is configured with 640*4096 == 2.5MB block size # So we would pick the next largest power of 2 for our block size ``` blocksize if the unit that will be used for reading from disk and sending across the network. It should be as big as possible without making it so big that you no longer fit into L3 Cache. It should be aligned closely to whatever your storage is using. In the example above, our block size is > 2MB but <= 4MB, so we use: ``` $ escp -b 4M [host]:[path] # Aligned to 2^n ``` If our storage optimizations are either unneeded or insufficient, lets move on to network or CPU. In general, for network we just increase threads until we are happy with our performance: ``` $ escp -b 4M -t 8 [host] # Keep our block size but use 8 threads ``` However, this sometimes does not scale as expected, particularly if you are using a NUMA enabled processor. In these sorts of cases, lets pin EScp to run on a NUMA node different than our network card. The preferred approach is to us `config_tool`, included with EScp. Example: ``` # escp/scripts/config_tool.py eno1 > /etc/escp.conf ``` will result in something like: /etc/escp.conf ``` cpumask: "FFFF0000" nodemask: "2" # Use 2nd NUMA node: cores 16-31 w/ memory node 2 ``` This will pin EScp to the netork card eno1. Caution: `config_tool` is misleading because you really want EScp to run on the NUMA node that *isn't* where your transfer NIC is located. In this particular example, our transfer NIC is called eth100 on a different NUMA node, so the config works, but, you may find that you need to hand edit it, reversing the config. When editing by hand, be careful with your hex numbers (FF != FF00). nodemask and cpumask are passed to `set_mempolicy` and `sched_setaffinity`. cpumask is HEX, nodemask INT, both are YAML Strings. You may want to enable verbose logging to verify that the mask was applied and/or check `htop` when transferring your files. If you did all of these things and feel that the network and block storage are still not the bottleneck, you can reduce CPU usage doing things like: ``` escp --nochecksum ``` If you are CPU bound, you should see a slight uptick in performance by disabling checksums. DEBUGGING ========= Your first step is to enable verbose logs; ``` escp -v --logfile /path/to/logs ``` If this does not resolve your issue, consider reaching out or attempting to debug using system debuggers. If debugging you may need to compile with debug symbols (the default unless --release is specified), and then running escp through a debugger. As an example: ``` # Start server rust-gdb --args escp --mgmt /path/to/mgmt/socket --server dont care: # break rust_panic # Start client rust-gdb --args escp --mgmt /path/to/mgmt/socket file localhost: ``` In the example above, rather than connecting the management interface automatically (using SSH), the management interface is through a socket on the server. If you are unable to replicate the issue locally, you can start escp through a debugger but specify your remote host (as per normal). Once the session is established you can connect your debugger on the remote end using the --pid option. You can also do something exotic (insecure) like create a management socket over TCP using socat: ``` socat TCP4-LISTEN:1234 UNIX-CONNECT:/tmp/foo.sock socat UNIX-RECVFROM:/tmp/foo.sock TCP4:yikes.com:1234 ``` Things to watch out for with debug logs: 1) It will slow down EScp, in some cases noticably. This is espcially apparent with RUST based events which block until written. 2) libDTN events are logged to a circular buffer. This buffer can overflow and if an overflow occurs the oldest log entries will be overwritten. 3) libDTN logs are not timestamped, although they are sequential. The timestamp comes from the RUST backend when the message is pulled off the queue. This means that a libDTN message can show up before an EScp message when in reality the EScp message occured before. SECURITY ======== EScp works by establishing an SSH session to a remote host (using system SSH binary) and then starting EScp on the receiver. After exchanging keys through the SSH channel, the sender connects the DATA channel using the port specified by receiver. All data (outside of SSH) is encrypted with AES128-GCM. The on-the-wire format is shown below: ``` /* /-----+--------------+----------\ * | IV | Payload | HMAC | * | 8 | | 16 | * | AAD | Encrypted | Auth tag | * \-----+--------------+----------/ */ ``` There are two data types, metadata and file, which is inferred from the IV. Each datatype specifies the length of the payload. Internally EScp uses AES-GCM using the `ISA-L_crypto` library. The implementation follows NIST 800-38D guidelines and has not been peer-reviewed. If you want more information, check `network_recv` or `network_initrx` in `libdtn/src/dtn.c`. DEV NOTES ========= ``` Compression: EScp supports compression through the zstd simplified API. It would be better to use the streaming API to conserve state between frames, and thereby reduce over the wire data. Checksums: It would be nice to have an option to write checksums for a transfer to a file and/or support alternative checksums. Sparse Files: Sparse files are a receiver feature, which is backwards. It makes more sense to just not send the block from the get-go. This is ameliorated by compression, however, still bad. We could also plug-into OS support for sparse files and avoid reading the data in the first place. Error Messages (Expected in 0.9): VRFY macro is used to terminate execution if the receiver runs into a condition in which it is impossible to continue. That error message should be gracefully sent to sender before terminating. NUMA Pinning: It would be nice to do NUMA pinning automatically. I attempted this in an earlier revision, but the library I was using was too heavy-weight and it seemed not worth the benefit. It would be nice to revisit this. SHM Engine/Block Based API (Expected in 0.9): Earlier versions of EScp contained a SHM engine to allow applications to write directly to an EScp transfer stream. This engine was overly complicated because of the Python Layer, but, it should now be possible to do in a relatively light weight fashion. Test Harness (Expected in 0.9): Testing right now is against a series of data sets and is manual. I created a stub test-harness, and this needs to be expanded. Very Small Files in control stream (Expected in 0.9) "-3" Mode (Try for 0.9, otherwise 1.0) Profiler/Tracer (Expected in 0.9): Optional functionality that can be used to debug performance issues (both internal to the software and external, like disk/network). Local Copy (1.0 Feature): Support High Performance local copy Send/Receive (1.0 Feature): Currently EScp only supports transfers from client to server, should support both directions. ``` DESIGN LIMITS ============= * Files are limited to 2^64 bytes * A max of 2^56 files can be transferred per session. * AES-GCM uses a 2^64 counter, which limits the number of blocks sent in a transfer session to 2^64. * Limited to 28 transfer threads AUTHOR ====== EScp is written by Charles Shiflett with the support of [ESnet](es.net). EScp is a side project and is not in any way an official or supported project of ESnet, The University of California, Lawrence Berkeley National Lab, or the US Department of Energy. I'd like to thank my team at ESnet; Anne White, Goran Pejović, Dhivakaran Muruganantham, George Robb, Luke Baker and Shawn Kwang, as well as Brian Tierney, Eli Dart, Ezra Kissel, Ken Miller, Eric Pouyoul, Jason Zurawski, Seyoung You, and Andrew Wiedlea for their support, encouragement, and assistance in developing EScp. Lastly, thanks to you, for using this application. EScp was written for you! If you are interested in contributing, you are in the unique position of being the first external collaborator to be added here. RELEASES ======== ``` SHA256 NAME 6cb3da78ece447c2ab7b9dd992bd92d30c3892075b250f2a58cba5ffed4b5974 0.8.0.tar.gz 755fd7c88a9983e45250e7d3ea2f5942295b5f3bdedb980cb06cd6deede212f2 EScp-0.7.1.tar.gz c91d47a3b0c6578e7a727af26700dabd79e0acbf0db7eeffbf3151b48980b8a6 EScp-0.7.0.zip ``` * [0.8.0.tar.gz](https://github.com/esnet/EScp/archive/refs/tags/0.8.0.tar.gz) * [0.7.1.tar.gz](https://github.com/esnet/EScp/archive/refs/tags/0.7.1.tar.gz) Changes from 0.7.1 to 0.8.0 (07 Nov 2024): * Breaks compatability with previous versions * Change to On-Wire format; Drops unneeded crypto wrapper. Change other message headers. * Add Compression (Using zstd) - use --compress flag or -C - automatic compress large metadata chunks (file information/verification) * Add Sparse File support (--sparse) * Add receiver timeout + keepalive (Minimize Zombie Receivers) * Session Init Changes * Add Preserve support (just for files) * Only log to file/syslog if specified on command line * Improve counters on receivers: - `bytes_network` shows bytes transmitted over the wire - `bytes_disk` shows bytes written to disk - `bytes_compressed` total of "data" bytes sent over the wire - Summary written to receiver log (if logging to a file/syslog) * Change how sender/receiver negotiate session termination - Removes session close delays and better file verification * Fix race condition and/or delay on file iteration * Fix directory traversal and close zero bytes file descriptors * Check that files don't leave prefix * Update how progress bar is displayed * Fix how empty files are handled * Long delays unschedule thread (as opposed to busy waiting) * Many fixes, much polishing, and more cleaning * Happy Birthday! Changes from 0.7.0 to 0.7.1 (20 June 2024): * Checksum feature enabled * Change how transfer progress is displayed when calculating files * Code cleanup w/ clippy & cargo audit Changes from 0.6.0 to 0.7.0 (5 June 2024): * Complete rewrite of python code * Change to fully lockless design - Previous versions locked when iterating through files * CLI Syntax more closely mirrors scp, w/ some borrow from iPerf * Config File format changed to YAML * HashMap for indexing file descriptors/file numbers LICENSE (BSD3) ============== ``` ESnet Secure Copy (EScp) Copyright (c) 2021-24, The Regents of the University of California, through Lawrence Berkeley National Laboratory (subject to receipt of any required approvals from the U.S. Dept. of Energy). All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: (1) Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. (2) Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. (3) Neither the name of the University of California, Lawrence Berkeley National Laboratory, U.S. Dept. of Energy nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. You are under no obligation whatsoever to provide any bug fixes, patches, or upgrades to the features, functionality or performance of the source code ("Enhancements") to anyone; however, if you choose to make your Enhancements available either publicly, or directly to Lawrence Berkeley National Laboratory, without imposing a separate written license agreement for such Enhancements, then you hereby grant the following license: a non-exclusive, royalty-free perpetual license to install, use, modify, prepare derivative works, incorporate into other computer software, distribute, and sublicense such enhancements or derivative works thereof, in binary and source code form. ```