Measuring throughput with benchmarking tool --------- * start the broker ``` RUST_LOG=rumq_broker=info cargo run --release ``` * run the tool with the configuration you like ``` ./benchmark --help ./benchmark -c 10 -m 10000 (open 10 connection with 10000 publishes each) ``` * results look like this ``` tekjar@thinkpad:~/Workspace/rumq/utils/benchmark$ ./benchmark -c 10 -m 50000 99% |||||||||||||||||||||||||||||||||||||||. [29s:0s] Id = bench-0 Total Messages = 50000 Average throughput = 1.6353620750123208 MB/s Id = bench-1 Total Messages = 50001 Average throughput = 1.6310719156488749 MB/s ... Total Messages = 500005 Average throughput = 16.03808488093616 MB/s ``` Findings --------- * There doesn't seem to be any improvement wrt cpu and throughput using `BufStream` ``` ./benchmark -c 10 -m 10000 (open 10 connection with 10000 publishes each) ``` * Using codec on top of `Read` has a huge perf benefit over async MqttRead/Write There is 8X improvement in throughput ```rust // https://docs.rs/tokio/0.2.6/src/tokio/io/util/buf_reader.rs.html#117 // TODO What happens when socket receives only 4K of data and there is no new data? // TODO Also might have to periodically flush to send mqtt packets which are part of // partially filled write bufers let mut stream = BufStream::new(&mut self.stream); ``` Moooree tools ---------- #####Flamegraph ``` RUSTFLAGS="-g -C force-frame-pointers=y" cargo flamegraph rumq-cli ``` #####Perf Build in release mode with debug symbols (-g) and frame pointers enabled ``` RUSTFLAGS="-g -C force-frame-pointers=y" cargo build --release perf record -g perf report -g graph,0.5,caller ``` References ----------- * https://github.com/flamegraph-rs/flamegraph * https://gendignoux.com/blog/2019/11/09/profiling-rust-docker-perf.html