--------------------------------
                     LAMMPS ACCELERATOR LIBRARY
                  --------------------------------
                     
                       W. Michael Brown (ORNL)
                        Trung Dac Nguyen (ORNL)
                          Peng Wang (NVIDIA)
                       Axel Kohlmeyer (Temple)
                         Steve Plimpton (SNL)
                        Inderaj Bains (NVIDIA)

-------------------------------------------------------------------

This directory has source files to build a library that LAMMPS
links against when using the GPU package.

This library must be built with a C++ compiler, before LAMMPS is
built, so LAMMPS can link against it.

You can type "make lib-gpu" from the src directory to see help on how
to build this library via make commands, or you can do the same thing
by typing "python Install.py" from within this directory, or you can
do it manually by following the instructions below.

Build the library using one of the provided Makefile.* files or create
your own, specific to your compiler and system.  For example:

make -f Makefile.linux

When you are done building this library, two files should
exist in this directory:

libgpu.a		the library LAMMPS will link against
Makefile.lammps		settings the LAMMPS Makefile will import

Makefile.lammps is created by the make command, by copying one of the
Makefile.lammps.* files.  See the EXTRAMAKE setting at the top of the
Makefile.* files.

IMPORTANT: You should examine the final Makefile.lammps to insure it is
correct for your system, else the LAMMPS build can fail.

IMPORTANT: If you re-build the library, e.g. for a different precision
(see below), you should do a "make clean" first, e.g. make -f
Makefile.linux clean, to insure all previous derived files are removed
before the new build is done.

Makefile.lammps has settings for 3 variables:

user-gpu_SYSINC = leave blank for this package
user-gpu_SYSLIB = CUDA libraries needed by this package
user-gpu_SYSPATH = path(s) to where those libraries are

Because you have the CUDA compilers on your system, you should have
the needed libraries.  If the CUDA developement tools were installed
in the standard manner, the settings in the Makefile.lammps.standard
file should work.

-------------------------------------------------------------------

                          GENERAL NOTES
                  --------------------------------
                          
This library, libgpu.a, provides routines for GPU acceleration
of certain LAMMPS styles and neighbor list builds. Compilation of this 
library requires installing the CUDA GPU driver and CUDA toolkit for
your operating system. Installation of the CUDA SDK is not necessary.
In addition to the LAMMPS library, the binary nvc_get_devices will also
be built. This can be used to query the names and properties of GPU 
devices on your system. A Makefile for OpenCL compilation is provided,
but support for OpenCL use is not currently provided by the developers.
Details of the implementation are provided in:

----

Brown, W.M., Wang, P. Plimpton, S.J., Tharrington, A.N. Implementing 
Molecular Dynamics on Hybrid High Performance Computers - Short Range 
Forces. Computer Physics Communications. 2011. 182: p. 898-911. 

and

Brown, W.M., Kohlmeyer, A. Plimpton, S.J., Tharrington, A.N. Implementing 
Molecular Dynamics on Hybrid High Performance Computers - Particle-Particle 
Particle-Mesh. Computer Physics Communications. 2012. 183: p. 449-459. 

and

Brown, W.M., Masako, Y. Implementing Molecular Dynamics on Hybrid High 
Performance Computers - Three-Body Potentials. Computer Physics Communications. 
2013. 184: p. 2785–2793.

----

NOTE: Installation of the CUDA SDK is not required, only the CUDA
toolkit itself or an OpenCL 1.2 compatible header and library.

Pair styles supporting GPU acceleration this this library
are marked in the list of Pair style potentials with a "g".
See the online version at: https://lammps.sandia.gov/doc/Commands_pair.html

In addition the (plain) pppm kspace style is supported as well.


                     MULTIPLE LAMMPS PROCESSES
                  --------------------------------
                     
Multiple LAMMPS MPI processes can share GPUs on the system, but multiple
GPUs cannot be utilized by a single MPI process. In many cases, the
best performance will be obtained by running as many MPI processes as
CPU cores available with the condition that the number of MPI processes
is an integer multiple of the number of GPUs being used. See the 
LAMMPS user manual for details on running with GPU acceleration.


                    BUILDING AND PRECISION MODES
                  --------------------------------

To build, edit the CUDA_ARCH, CUDA_PRECISION, CUDA_HOME variables in one of 
the Makefiles. CUDA_ARCH should be set based on the compute capability of
your GPU. This can be verified by running the nvc_get_devices executable after
the build is complete. Additionally, the GPU package must be installed and
compiled for LAMMPS. This may require editing the gpu_SYSPATH variable in the
LAMMPS makefile.

Please note that the GPU library accesses the CUDA driver library directly,
so it needs to be linked not only to the CUDA runtime library (libcudart.so)
that ships with the CUDA toolkit, but also with the CUDA driver library
(libcuda.so) that ships with the Nvidia driver. If you are compiling LAMMPS
on the head node of a GPU cluster, this library may not be installed,
so you may need to copy it over from one of the compute nodes (best into
this directory). Recent CUDA toolkits starting from CUDA 9 provide a dummy
libcuda.so library, that can be used for linking (but not for running).

The gpu library supports 3 precision modes as determined by 
the CUDA_PRECISION variable:

  CUDA_PRECISION = -D_SINGLE_SINGLE  # Single precision for all calculations
  CUDA_PRECISION = -D_DOUBLE_DOUBLE  # Double precision for all calculations
  CUDA_PRECISION = -D_SINGLE_DOUBLE  # Accumulation of forces, etc. in double

As of CUDA 7.5 only GPUs with compute capability 2.0 (Fermi) or newer are
supported and as of CUDA 9.0 only compute capability 3.0 (Kepler) or newer
are supported. There are some limitations of this library for GPUs older
than that, which require additional preprocessor flag, and limit features,
but they are kept for historical reasons. There is no value in trying to
use those GPUs for production calculations.

You have to make sure that you set a CUDA_ARCH line suitable for your
hardware and CUDA toolkit version: e.g. -arch=sm_35 for Tesla K20 or K40
or -arch=sm_52 GeForce GTX Titan X. A detailed list of GPU architectures
and CUDA compatible GPUs can be found e.g. here: 
https://en.wikipedia.org/wiki/CUDA#GPUs_supported

NOTE: when compiling with CMake, all of the considerations listed below
are considered within the CMake configuration process, so no separate 
compilation of the gpu library is required. Also this will build in support
for all compute architecture that are supported by the CUDA toolkit version
used to build the gpu library.

Please note the CUDA_CODE settings in Makefile.linux_multi, which allows
to compile this library with support for multiple GPUs. This list can be
extended for newer GPUs with newer CUDA toolkits and should allow to build
a single GPU library compatible with all GPUs that are worth using for
GPU acceleration and supported by the current CUDA toolkits and drivers.

NOTE: The system-specific setting LAMMPS_SMALLBIG (default), LAMMPS_BIGBIG, 
      or LAMMPS_SMALLSMALL if specified when building LAMMPS (i.e. in 
      src/MAKE/Makefile.foo) should be consistent with that specified 
      when building libgpu.a (i.e. by LMP_INC in the lib/gpu/Makefile.bar).

                      EXAMPLE CONVENTIONAL BUILD PROCESS
                  --------------------------------
                    
cd ~/lammps/lib/gpu
emacs Makefile.linux
make -f Makefile.linux
./nvc_get_devices
cd ../../src
emacs ./MAKE/Makefile.linux
make yes-asphere
make yes-kspace
make yes-gpu
make linux