Build the examples by typing in each directory: 
make -j 16

To specify a target device:
make openmp -j 16
make pthreads -j 16
make serial -j 16
make cuda -j 16

The lambda variants can not be build with CUDA=yes at the moment, since
CUDA does not support lambdas from the host. 
Some of the advanced topics try to highlight performance impacts by timing 
different variants of doing the same thing.
Also some of the advanced topics (in particular hierarchical parallelism)
require C++11 even with out using host side lambdas. CUDA 6.5 can be used 
to compile those.