Tutorial ====== **Table of Contents** - [Coding](#coding) - [Compiling](#compiling) - [Running](#running) - [Testing](#testing) > Before getting started, it is best to [test](#testing) QuEST on your hardware. # Coding QuEST can be integrated into your C or C++ project, simply by including ```C #include ``` Your simulation code will look the same and compile with the same build system, regardless of whether run in multithreaded, GPU and distributed modes. For example, here is a platform agnostic simulation of a very simple circuit which produces and measures state ![equation](https://latex.codecogs.com/gif.latex?C_0%28X_1%29%20H_0%20%7C00%5Crangle) ```C #include int main() { // load QuEST QuESTEnv env = createQuESTEnv(); // create a 2 qubit register in the zero state Qureg qubits = createQureg(2, env); initZeroState(qubits); // apply circuit hadamard(qubits, 0); controlledNot(qubits, 0, 1); measure(qubits, 1); // unload QuEST destroyQureg(qubits, env); destroyQuESTEnv(env); return 0; } ``` Of course, this code doesn't output anything! ---------------------- Let's walk through a more sophisticated circuit. We first construct a QuEST environment with [`createQuESTEnv()`](https://quest-kit.github.io/QuEST/group__type.html#ga8ba2c3388dd64d9348c3b091852d36d4) which abstracts away any preparation of multithreading, distribution or GPU-acceleration strategies. ```C QuESTEnv env = createQuESTEnv(); ``` We then create a quantum register, in this case containing 3 qubits, via [`createQureg()`](https://quest-kit.github.io/QuEST/group__type.html#ga3392816c0643414165c2f5caeec17df0) ```C Qureg qubits = createQureg(3, env); ``` and [initialise](https://quest-kit.github.io/QuEST/group__init.html) the register. ```C initZeroState(qubits); ``` We can create multiple `Qureg` instances, and QuEST will sort out allocating memory for the state-vectors, even over networks! If we wanted to simulate noise in our circuit, we can replace `createQureg` with [`createDensityQureg`](https://quest-kit.github.io/QuEST/group__type.html#ga93e55b6650b408abb30a1d4a8bce757c) to create a more powerful density matrix capable of representing mixed states, and simulating [decoherence](https://quest-kit.github.io/QuEST/group__decoherence.html). We're now ready to apply some [unitaries](https://quest-kit.github.io/QuEST/group__unitary.html) to our qubits, which in this case have indices `0`, `1` and `2`. When applying an operator, we pass along which quantum register to operate upon. ```C hadamard(qubits, 0); controlledNot(qubits, 0, 1); rotateY(qubits, 2, .1); ``` Some gates allow us to specify a general number of control qubits ```C int controls[] = {0, 1, 2}; multiControlledPhaseGate(qubits, controls, 3); ``` We can specify general single-qubit unitary operations as 2x2 matrices ```C // sqrt(X) with a pi/4 global phase ComplexMatrix2 u = { .real = {{.5, .5}, { .5,.5}}, .imag = {{.5,-.5}, {-.5,.5}}}; unitary(qubits, 0, u); ``` or more compactly, foregoing the global phase factor, ```C Complex a = {.real = .5, .imag = .5}; Complex b = {.real = .5, .imag =-.5}; compactUnitary(qubits, 1, a, b); ``` or even more compactly, as a rotation around an arbitrary axis on the Bloch-sphere ```C Vector v = {.x=1, .y=0, .z=0}; rotateAroundAxis(qubits, 2, 3.14/2, v); ``` We can controlled-apply general unitaries ```C controlledCompactUnitary(qubits, 0, 1, a, b); ``` even with multiple control qubits! ```C multiControlledUnitary(qubits, (int[]) {0, 1}, 2, 2, u); ``` There are many questions and [calculations](https://quest-kit.github.io/QuEST/group__calc.html) we can now ask of our quantum register. ```C qreal prob = getProbAmp(qubits, 7); printf("Probability amplitude of |111>: %lf\n", prob); ``` Here, `qreal` is an alias for a real floating point number, like `double`. This is to keep our code precision agnostic, so that we may change the numerical precision at compile time (by setting build option `PRECISION`) without any changes to our code. Changing the precision can be useful in verifying numerical convergences or studying rounding errors. How probable is measuring our final qubit (with index `2`) in outcome `1`? ```C prob = calcProbOfOutcome(qubits, 2, 1); printf("Probability of qubit 2 being in state 1: %f\n", prob); ``` We can also perform [non-unitary gates](https://quest-kit.github.io/QuEST/group__normgate.html) upon the state. Let's destructively measure the first qubit, randomly collapsing into outcome `0` or `1` ```C int outcome = measure(qubits, 0); printf("Qubit 0 was measured in state %d\n", outcome); ``` and now measure our final qubit, while also learning of the probability of its outcome. ```C outcome = measureWithStats(qubits, 2, &prob); printf("Qubit 2 collapsed to %d with probability %f\n", outcome, prob); ``` We could even apply [non-physical operators](https://quest-kit.github.io/QuEST/group__operator.html) to our register, to break its normalisation, which can often allow us to take computational shortcuts like [this one](https://arxiv.org/abs/2009.02823). At the conclusion of our circuit, we should free up the memory used by our quantum registers. ```C destroyQureg(qubits, env); destroyQuESTEnv(env); ``` The effect of the [code above](tutorial_example.c) is to simulate the circuit below
and after compiling (see section below) and running, gives psuedo-random output > ``` > Probability amplitude of |111>: 0.498751 > Probability of qubit 2 being in state 1: 0.749178 > Qubit 0 was measured in state 1 > Qubit 2 collapsed to 1 with probability 0.998752 > ``` > ``` > Probability amplitude of |111>: 0.498751 > Probability of qubit 2 being in state 1: 0.749178 > Qubit 0 was measured in state 0 > Qubit 2 collapsed to 1 with probability 0.499604 > ``` QuEST uses the [Mersenne Twister](http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/MT2002/emt19937ar.html) algorithm to generate random numbers used for randomly collapsing quantum states. The user can seed this RNG using [`seedQuEST()`](https://quest-kit.github.io/QuEST/group__debug.html#ga555451c697ea4a9d27389155f68fdabc), otherwise QuEST will by default create a seed from the current time and the process id. > In distributed mode (see below), all code in your source files will be executed independently on every node. > To execute some code (e.g. printing) only on one node, use > ```C > QuESTEnv env = createQuESTEnv(); > > if (env.rank == 0) > printf("Only one node executes this print!"); > ``` > Such conditions are valid and always satisfied in code run on a single node. ---------------------------- # Compiling QuEST uses [CMake](https://cmake.org/) (version `3.7` or higher) as its build system. Configure the build by supplying the below `-D[VAR=VALUE]` options after the `cmake ..` command. You can alternatively compile via [GNU Make](https://www.gnu.org/software/make/) directly with the provided [makefile](makefile). > **Windows** users should install [CMake](https://cmake.org/download/) and [Build Tools](https://visualstudio.microsoft.com/downloads/#build-tools-for-visual-studio-2019), and run the below commands in the *Developer Command Prompt for VS* To compile, run: ```console mkdir build cd build cmake .. -DUSER_SOURCE="[FILENAME]" make ``` where `[FILENAME]` is the name of your source file, including the file extension, relative to the root QuEST directory (above `build`). > Windows users should replace the final two build commands with > ```bash > cmake .. -G "NMake Makefiles" > nmake > ``` > If using MSVC and NMake in this way fails, users can forego GPU acceleration, download > [MinGW-w64](https://sourceforge.net/projects/mingw-w64/), and compile via > ```bash > cmake .. -G "MinGW Makefiles" > make > ``` > Compiling directly with `make` and the provided [makefile](makefile), copied to the root directory, may prove easier. If your project contains multiple source files, separate them with semi-colons. For example, ```console -DUSER_SOURCE="source1.c;source2.cpp" ``` - To set the compilers used by cmake (to e.g. `gcc-6`), use ```console -DCMAKE_C_COMPILER=gcc-6 ``` and similarly to set the C++ compiler (as used in GPU mode), use ```console -DCMAKE_CXX_COMPILER=g++-6 ``` - If you wish your executable to be named something other than `demo`, you can set this too by adding argument: ```console -DOUTPUT_EXE="myExecutable" ``` - To compile your code to use multithreading, for parallelism on multi-core or multi-CPU systems, use ```console -DMULTITHREADED=1 ``` Before launching your executable, set the number of participating threads using `OMP_NUM_THREADS`. For example, ```console export OMP_NUM_THREADS=16 ./myExecutable ``` - To compile your code to run on distributed or networked systems use ```console -DDISTRIBUTED=1 ``` Depending on your MPI implementation, your executable can be launched via ```console mpirun -np [NUM_NODES] [EXEC] ``` where `[NUM_NODES]` is the number of distributed compute nodes to use, and `[EXEC]` is the name of your executable. Note that QuEST *hybridises* multithreading and distribution. Hence you should set `[NUM_NODES]` to equal exactly the number of distinct compute nodes (which don't share memory), and set `OMP_NUM_THREADS` as above to assign the number of threads used on *each* compute node. - To compile for GPU, use ```console -DGPUACCELERATED=1 -DGPU_COMPUTE_CAPABILITY=[CC] .. ``` were `[CC]` is the compute cabability of your GPU, written without a decimal point. This can can be looked up at the [NVIDIA website](https://developer.nvidia.com/cuda-gpus). > Note that CUDA is not compatible with all compilers. To force `cmake` to use a > compatible compiler, override `CMAKE_C_COMPILER` and `CMAKE_CXX_COMPILER`. > For example, to compile for the [Quadro P6000](https://www.pny.com/nvidia-quadro-p6000) > with `gcc-6`: > ```console > cmake .. -DGPUACCELERATED=1 -DGPU_COMPUTE_CAPABILITY=61 \ > -DCMAKE_C_COMPILER=gcc-6 -DCMAKE_CXX_COMPILER=g++-6 > ``` - You can additionally customise the floating point precision used by QuEST's `qreal` type, via ```console -DPRECISION=1 -DPRECISION=2 -DPRECISION=4 ``` which uses single (`qreal = float`), double (`qreal = double`) and quad (`qreal = long double`) respectively. Using greater precision means more precise computation but at the expense of additional memory requirements and runtime. Checking results are unchanged when switching the precision can be a great test that your calculations are sufficiently precise. After making changes to your code, you can quickly recompile using `make` directly, within the `build/` directory. For a full list of available configuration parameters, use ```console cmake -LH .. ``` For manual configuration (not recommended) you can change the `CMakeLists.txt` in the root QuEST directory. You can also directly modify [makefile](makefile), and compile using GNUMake directly, by copying [makefile](makefile) into the root repository directory and running ```console make ``` ---------------------------- # Running ## Locally Once compiled as above, the compiled executable can be locally run from within the `build` directory. ```console ./myExecutable ``` - In multithreaded mode, the number of threads QuEST will use can be set by modifying `OMP_NUM_THREADS`, ideally to the number of available cores on your machine ```console export OMP_NUM_THREADS=8 ./myExecutable ``` - In distributed mode, QuEST will uniformly divide every `Qureg` between a power-of-2 number of nodes, and can be launched with `mpirun`. For example, here using `8` nodes ```console mpirun -np 8 ./myExecutable ``` If multithreading is also enabled, the number of threads used by each node can be set using `OMP_NUM_THREADS`. For example, here using `8` nodes with `16` threads on each (a total of `128` processors): ```console export OMP_NUM_THREADS=16 mpirun -np 8 ./myExecutable ``` - In GPU mode, the executable is launched directly via ```console ./myExecutable ``` ## On supercomputers There are no special requirements for running QuEST through job submission systems, like [SLURM](https://slurm.schedmd.com/documentation.html). Just call `./myExecutable` as you would any other binary. For example, the [tutorial code](tutorial_example.c) can be run with on `4` distributed nodes (each with `8` cores) on a SLURM system using the following SLURM submission script ```console #SBATCH --nodes=4 #SBATCH --ntasks-per-node=1 module load mvapich2 mkdir build cd build cmake .. -DDISTRIBUTED=1 -DMULTITHREADED=1 make export OMP_NUM_THREADS=8 mpirun ./myExecutable ``` A [PBS](https://www.openpbs.org/) submission script like is similar ```console #PBS -l select=4:ncpus=8 module purge module load mvapich2 mkdir build cd build cmake -DDISTRIBUTED=1 .. make export OMP_NUM_THREADS=8 aprun -n 4 -d 8 -cc numa_node ./myExecutable ``` Running QuEST on a GPU is just a matter of specifying resources and the appropriate compilers ```console #SBATCH --nodes=1 #SBATCH --ntasks-per-node=1 #SBATCH --gres=gpu:1 #SBATCH --partition=gpu ## name may vary module purge module load cuda ## name may vary mkdir build cd build cmake -DGPUACCELERATED=1 -DGPU_COMPUTE_CAPABILITY=[Compute capability] .. make ./myExecutable ``` On each platform, there is no change to our source code or our QuEST interface. We simply recompile, and QuEST will utilise the available hardware (a GPU, shared-memory or distributed CPUs) to speedup our code. ---------------------------- # Testing QuEST includes a comprehensive set of unit tests, to assure every function performs correctly. These are located in the [tests](../tests) directory (documented [here](https://quest-kit.github.io/QuEST/group__unittest.html)), and compare QuEST's optimised routines to slower, algorithmically distinct methods (documented [here](https://quest-kit.github.io/QuEST/group__testutilities.html)). It is a good idea to run these tests on your machine to check QuEST is properly configured, and especially so in GPU mode, to check you have correctly set [`GPU_COMPUTE_CAPABILITY`](https://developer.nvidia.com/cuda-gpus). Tests should be compiled in a build directory within the root QuEST directory. ```console mkdir build cd build ``` To compile, run: ```console cmake .. -DTESTING=ON make ``` You can include additional CMake arguments to target your desired hardware, such as `-DDISTRIBUTION=1`. Next, to launch all unit tests, run: ```console make test ``` You should see each function being tested in turn; some will be very fast, and some very slow. > This is because the tests run functions with every one of their possible inputs > (where possible). > Functions with more possible inputs will hence take longer to test. > The difference in testing time between different functions can hence be very large, and does not indicate a testing nor performance problem. For example: ``` Start 1: calcDensityInnerProduct 1/117 Test #1: calcDensityInnerProduct ............. Passed 0.16 sec Start 2: calcExpecDiagonalOp 2/117 Test #2: calcExpecDiagonalOp ................. Passed 0.07 sec Start 3: calcExpecPauliHamil 3/117 Test #3: calcExpecPauliHamil ................. Passed 0.64 sec Start 4: calcExpecPauliProd 4/117 Test #4: calcExpecPauliProd .................. Passed 94.88 sec ``` You can also run the executable `build/tests/tests` directly, to see more statistics, and to make use of the Catch2 [command-line](https://github.com/catchorg/Catch2/blob/devel/docs/command-line.md) ```console ./tests/tests =============================================================================== All tests passed (99700 assertions in 117 test cases) ``` This is necessary to run the tests in distributed mode: ```console mpirun -np 8 tests/tests ``` Using the [command-line](https://github.com/catchorg/Catch2/blob/devel/docs/command-line.md) is especially useful for contributors to QuEST, for example to run only their new function: ```console ./tests/tests myNewFunction ``` or a sub-test within: ```console ./tests/tests myNewFunction -c "correctness" -c "density-matrix" -c "unnormalised" ``` Ideally, a new function should have its unit test run in every configuration of hardware (including #threads and #nodes) and precision. The below bash script automates this. ```bash export f=myNewFunction # function to test export cc=30 # GPU compute-capability export nt=16 # number of CPU threads test() { cmake .. -DTESTING=ON -DPRECISION=$p \ -DMULTITHREADED=$mt -DDISTRIBUTED=$d \ -DGPUACCELERATED=$ga -DGPU_COMPUTE_CAPABILITY=$cc # insert additional cmake params here, if needed make export OMP_NUM_THREADS=$nt if (( $d == 1 )); then mpirun -np $nn ./tests/tests $f else ./tests/tests $f fi } # precision for p in 1 2 4; do # serial mt=0 d=0 ga=0 test # multithreaded mt=1 d=0 ga=0 test # gpu mt=0 d=0 ga=1 test # distributed (+multithreaded) for nn in 2 4 8 16; do mt=1 d=1 ga=0 test done done ```