Crates.io | fft_bench |
lib.rs | fft_bench |
version | 0.1.1 |
source | src |
created_at | 2020-07-14 06:18:52.288345 |
updated_at | 2020-07-14 06:24:26.354559 |
description | Benchmark using FFT |
homepage | https://github.com/zuntan/fft_bench |
repository | https://github.com/zuntan/fft_bench |
max_upload_size | |
id | 264931 |
size | 113,942 |
In developing a product that runs on RPi3, I wanted performance indicators on RPi3.
I also found that there is a big difference in the speed of processing between debug and release builds of Rust.
So I made a simple program and got a benchmark.
You can run the benchmark with the following command steps.
$ git clone https://github.com/zuntan/fft_bench.git
$ cd fft_bench
$ cargo check
$ ( TIME="\nTIME R:%e S:%S U:%U P:%P CMD:%C"; \time bash run_bench.sh 2>&1 ) | tee result/result.txt
( I can't speak English, so I am writing the text with Google Translate. Please forgive me for strange sentences. )
This program internally generates a sine wave and FFT analyzes the data.
Then, the benchmark is obtained by measuring the compilation time of the program and the processing time of the program.
This program has the following options.
$ cargo run -- --help
Finished dev [unoptimized + debuginfo] target(s) in 0.01s
Running `target/debug/fft_bench --help`
Usage: target/debug/fft_bench [options]
Options:
-w, --wav [tmp.wav] wav file output
-x, --fftbw_wav [tmp_fftbw.wav]
wav file output (fft backword)
-l, --len 15 wav file output len (sec)
-f, --freq 1000 frequency (20-20000)
-6, --f64 use f64 insteed of f32
-s, --skip_fft skip fft
-b, --enable_fft_backward
enable fft_backward
-v, --version Print version info and exit.
--help Print this help menu.
-l
-f
-6
-w
-s
-b
-x
Sample
$ cargo run --release -- -l 180 -6 -w -b -x ***
Finished release [optimized] target(s) in 0.01s
Running `target/release/fft_bench -l 180 -6 -w -b -x`
INFO fft_bench > f64 [true]
INFO fft_bench > skip_fft [false]
INFO fft_bench > wav output [tmp.wav]
INFO fft_bench > enable_fft_bw [true]
INFO fft_bench > wav output bw [tmp_fftbw.wav]
INFO fft_bench > wav len [180] sec
INFO fft_bench > freqs [440.0, 554.364990234375, 659.2550048828125]
INFO fft_bench > wav amp [3.0]
INFO fft_bench > 30 sec done
INFO fft_bench > 60 sec done
INFO fft_bench > 90 sec done
INFO fft_bench > 120 sec done
INFO fft_bench > 150 sec done
INFO fft_bench > 180 sec done
INFO fft_bench > fft count [1939]
INFO fft_bench > wav sample len [7938048]
INFO fft_bench > wav file len [15884332]
INFO fft_bench > wav time (1) [ 180.0011] sec
INFO fft_bench > proc time (2) [ 0.7240] sec
INFO fft_bench > (1) / (2) [ 248.6203]
It measures both the compile time of this project itself and the program execution.
In the first half, measure the processing speed of cargo(rust)
The second half measures the execution time of this program. The difference between f32 and f64 in FFT processing and the difference between DEBUG BUILD and RELEASE BUILD are interesting.
-s
... skip_fft no_output-s -w
... skip_fft wave_output_only
... enable_fft no_output-b
... enable_fft enable_fft_backword no_output-w
... enable_fft wave_output-b -x
... enable_fft enable_fft_backword wave_output wave_fft_backword_output-6
... f64 enable_fft no_output-6 -b
... f64 enable_fft enable_fft_backword no_output-6 -w
... f64 enable_fft wave_output-6 -w -b -x
... f64 enable_fft enable_fft_backword wave_output wave_fft_backword_outputcat /proc/cpuinfo
free -h
and cat /proc/meminfo
Running this benchmark produces tmp.wav and tmp_fftbw.wav in the current directory.
tmp.wav is a file in which chords of A, C#, and E last for 180 seconds.
tmp_fftbw.wav is the data obtained by performing the inverse FFT processing after the FFT processing of the tmp.wav data.
Please try to hear.
The result of running on my server and Raspberry Pi 3 is in the result directory
X86 | X86(CPU) | RPi 3B | RPi 3B(CPU) | RPi 3B : X86 | |
---|---|---|---|---|---|
cargo check 1 | 3.66 | 355% | 48.48 | 303% | 13.25 |
cargo check 2 | 0.07 | 101% | 0.61 | 102% | 8.71 |
cargo build 1 | 6.15 | 488% | 97.84 | 343% | 15.91 |
cargo build 2 | 0.92 | 102% | 12.91 | 100% | 14.03 |
cargo build --release 1 | 9.65 | 824% | 177.4 | 285% | 18.38 |
cargo build --release 2 | 1.38 | 413% | 18.76 | 278% | 13.59 |
cargo build 1 (cross arm7) | 6.61 | 458% | 96.10 | 345% | 14.54 |
cargo build 2 (cross arm7) | 1.36 | 101% | 12.91 | 100% | 9.49 |
cargo build --release 1 (cross arm7) | 9.35 | 845% | 176.36 | 383% | 18.86 |
cargo build --release 2 (cross arm7) | 1.47 | 389% | 18.08 | 278% | 12.30 |
At compile time, the X86 side is 8 to 14 times faster than RPi 3B. 13-18 times faster with a clean build
The difference between the debug build and the release build is about 1.5 times, which is not a big difference.
In my experience, compiling a large project on RPi 3B can take 10 minutes or more, so repeating builds on RPi 3B is not efficient. It can be said that development in a cross-compile environment is efficient.
FFT | FFT-BW | FILE-OUT | f64 | X86 (D) | RPi 3B (D) | X86 (R) | RPi 3B (R) | X86(D)/(R) | RPi 3B (D)/(R) | RPi 3B(D) : X86(D) | RPi 3B(R) : X86(R) | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
-s |
4.44 | 31.38 | 0.29 | 3.45 | 15.31 | 9.10 | 7.07 | 9.10 | ||||
-s -w |
* | 4.02 | 21.77 | 0.31 | 3.58 | 12.97 | 6.08 | 5.42 | 6.08 | |||
|
* | 8.33 | 35.60 | 0.37 | 4.38 | 22.51 | 8.13 | 4.27 | 8.13 | |||
-b |
* | * | 13.98 | 57.38 | 0.45 | 5.28 | 31.07 | 10.87 | 4.10 | 10.87 | ||
-w |
* | * | 8.80 | 38.24 | 0.38 | 4.56 | 23.16 | 8.39 | 4.35 | 8.39 | ||
-b -x |
* | * | * | 15.18 | 65.47 | 0.49 | 6.25 | 30.98 | 10.48 | 4.31 | 10.48 | |
-6 |
* | * | 6.52 | 37.12 | 0.59 | 5.85 | 11.05 | 6.35 | 5.69 | 6.35 | ||
-6 -b |
* | * | * | 10.24 | 59.81 | 0.70 | 7.12 | 14.63 | 8.40 | 5.84 | 8.40 | |
-6 -w |
* | * | * | 7.00 | 41.56 | 0.61 | 6.05 | 11.48 | 6.87 | 5.94 | 6.87 | |
-6 -w -b -x |
* | * | * | * | 11.26 | 68.17 | 0.74 | 8.52 | 15.22 | 8.00 | 6.05 | 8.00 |
Very large difference in execution speed between debug build and release build. [ X86(D)/(R), RPi 3B (D)/(R) ]
*** Anyway, the debug build is running too slow. ***
The difference in execution speed between RPi3B and X86 is 6 to 10 times. [ RPi 3B(R) : X86(R) ]
Depending on the degree, rather than using Rust on RPi3B, it seems more efficient to develop it to some extent on an X86 machine and then run a cross-built one on RPi3B.
At that time, it is good to develop with RPi3B in mind that the performance difference is about 10 times.
If you use RPi3B, *** be sure to use the release build. *** The debug build runs too slowly.