Showing posts with label gamess. Show all posts
Showing posts with label gamess. Show all posts

11 June 2013

445. GAMESS US 2013 R1 on Debian (Wheezy) -- w/o GPU

Update 27/6/2013:
Please note that Kirill Berezovsky has published a series of posts on GAMESS US, including how to compile it for both CPU and GPU use. See
http://biochemicalmatters.blogspot.com.au/2013/06/gamess-us-frequently-asked-questions_26.html
http://biochemicalmatters.blogspot.ru/2013/06/gamess-us-frequently-asked-questions_1687.html
http://biochemicalmatters.blogspot.ru/2013/06/gamess-us-frequently-asked-questions_1447.html
http://biochemicalmatters.blogspot.com.au/2013/06/gamess-us-frequently-asked-questions.html


Original post:
A new version of GAMESS is out now (2013 R1): https://groups.google.com/forum/?fromgroups#!topic/gamess-announce/8j1esKifzEo

GPU support will be a later post.

0. Install a math library
You can use e.g. acml or atlas. See http://verahill.blogspot.com.au/2013/05/422-set-up-acml-on-linux.html for acml (which I've only had luck with on AMD machines).

To get the debian ATLAS libs do
sudo apt-get install libatlas3-base libatlas-dev

If you want to compile your own ATLAS libs, see e.g. http://verahill.blogspot.com.au/2012/09/rocks-543-atlas-and-gromacs-on-xeon.html or http://verahill.blogspot.com.au/2012/09/compile-atlas-gromacs-nwchem-on-amd-fx.html

You can also link to openblas, as shown in this post: http://verahill.blogspot.com.au/2012/09/compiling-and-testing-gamess-us-on.html


1. Get GAMESS US
Go to http://www.msg.chem.iastate.edu/GAMESS/download/register/
Check the tick boxes next to the architectures you intend to use GAMESS on. Fill out your email address, and hit Submit. You'll receive an email with instructions almost immediately.

The email will contain a URL to an archive with the source code, and a password for downloading it.
I'll presume that you downloaded the file, gamess-current.tar.gz, to ~/Downloads.


2. Untar and prepare
I'm presuming that you don't already have any /opt/gamess or /opt/gamess/gamess directories. In my earlier write-ups (e.g. this) I put everything in /opt/gamess, which isn't a good long-term strategy since you often want to keep earlier versions of computational software alongside newer ones.

sudo apt-get install build-essential gfortran openmpi-bin libopenmpi-dev libboost-all-dev
sudo mkdir /opt/gamess -p
sudo chown $USER:$USER /opt/gamess
cd /opt/gamess
cp ~/Downloads/gamess-current.tar.gz gamess-2013r1.tar.gz
tar xvf gamess-2013r1.tar.gz
mv gamess gamess-2013r1
cd gamess-2013r1/


3. Configure
NOTE: Even if you may have gfortran 4.7 (e.g. Wheezy) you should give the version as 4.6 during configure (see below).


./config
please enter your target machine name: linux64 GAMESS directory? [/opt/gamess/gamess-2013r1] GAMESS build directory? [/opt/gamess/gamess-2013r1] Version? [00] 13 Please enter your choice of FORTRAN: gfortran Please enter only the first decimal place, such as 4.1 or 4.6: 4.6
ACML:
Enter your choice of 'mkl' or 'atlas' or 'acml' or 'none': acml enter this full pathname: /opt/acml/acml5.3.1 Math library 'acml' will be taken from /opt/acml/acml5.3.1/gfortran64_int64/lib
Atlas:
I've compiled ATLAS myself in the past, but the libs have not worked with all programs. Here we use the debian libs instead.
Enter your choice of 'mkl' or 'atlas' or 'acml' or 'none': atlas Please enter the Atlas subdirectory on your system: /usr/lib/atlas-base Math library 'atlas' will be taken from /usr/lib/atlas-base
mpi:
communication library ('sockets' or 'mpi')? mpi Enter MPI library (impi, mvapich2, mpt, sockets): openmpi Please enter your openmpi's location: /usr/lib/openmpi

4. Build
Edit comp and change
1663 # -fno-whole-file suppresses argument's data type checking 1664 set OPT='-O2' 1665 if (".$GMS_DEBUG_FLAGS" != .) set OPT="$GMS_DEBUG_FLAGS"
to
1663 # -fno-whole-file suppresses argument's data type checking 1664 set OPT='-O0' 1665 if (".$GMS_DEBUG_FLAGS" != .) set OPT="$GMS_DEBUG_FLAGS"
or exam44.inp will fail.

cd ddi/
./compddi
cd ../
./compall

If you are building with openmpi, edit lked and change
958 case openmpi: 959 set MPILIBS="-L$GMS_MPI_PATH/lib64" 960 set MPILIBS="$MPILIBS -lmpi" 961 breaksw
to
958 case openmpi: 959 set MPILIBS="-L$GMS_MPI_PATH/lib" 960 set MPILIBS="$MPILIBS -lmpi" 961 breaksw
Make the proper symlinks if you are using ATLAS:
sudo ln -s /usr/lib/atlas-base/libatlas.so.3 /usr/lib/atlas-base/libatlas.so
sudo ln -s /usr/lib/atlas-base/libf77blas.so.3 /usr/lib/atlas-base/libf77blas.so

./lked gamess 13r1 


I use my own script called gmrun:
#!/bin/csh
set TARGET=mpi
set SCR=$HOME/scratch
set USERSCR=/scratch
set GMSPATH=/opt/gamess/gamess-2013r1
set JOB=$1
set VERNO=$2
set NCPUS=$3

if ( $JOB:r.inp == $JOB ) set JOB=$JOB:r
echo "Copying input file $JOB.inp to your run's scratch directory..."
cp $JOB.inp $SCR/$JOB.F05

setenv TRAJECT $USERSCR/$JOB.trj
setenv RESTART $USERSCR/$JOB.rst
setenv INPUT $SCR/$JOB.F05
setenv PUNCH $USERSCR/$JOB.dat
if ( -e $TRAJECT ) rm $TRAJECT
if ( -e  $PUNCH ) rm $PUNCH
if ( -e  $RESTART ) rm $RESTART
source $GMSPATH/gms-files.csh

setenv LD_LIBRARY_PATH /usr/lib/openmpi/lib:$LD_LIBRARY_PATH
mpiexec -n $NCPUS $GMSPATH/gamess.$VERNO.x|tee $JOB.out
cp $PUNCH .

To run, do e.g.
gmrun exam01.inp 13r1 2

09 May 2013

409.B.GAMESS US with GPU support on debian wheezy --the ACML edition. This works.


Update 27/6/2013:
Please note that Kirill Berezovsky has published a series of posts on GAMESS US, including how to compile it for both CPU and GPU use. See
http://biochemicalmatters.blogspot.com.au/2013/06/gamess-us-frequently-asked-questions_26.html
http://biochemicalmatters.blogspot.ru/2013/06/gamess-us-frequently-asked-questions_1687.html
http://biochemicalmatters.blogspot.ru/2013/06/gamess-us-frequently-asked-questions_1447.html
http://biochemicalmatters.blogspot.com.au/2013/06/gamess-us-frequently-asked-questions.html


Update 21 May 2013: See the comments below this post. This approach most likely works -- what has been confusing me is the lack of reports of GPU timings in the output, but this doesn't necessarily mean that the GPU isn't being used. The poster below, using nvidia-smi, observed GPU usage, although the speed-up was not major.

Blogspot needs versioning.
I lost the entire post when it was almost complete. Screw this.

Everything compiles fine, but no GPU output during calculation.

I see no evidence of the GPU being used at any stage.  Otherwise all is good -- the calcs run fine on the CPU.

Maybe someone else will have a better idea.

I looked at libcchem/aaa.readme.1st and http://combichem.blogspot.com.au/2011/02/compiling-gamess-with-cuda-gpu-support.html to get as far as I did.

Setting up gamess
Get gamess (see e.g. http://verahill.blogspot.com.au/2012/09/compiling-and-testing-gamess-us-on.html). Put gamess-current.tar.gz in ~/tmp

sudo apt-get install libboost-all-dev build-essential g++ gfortran automake nvidia-cuda-toolkit python-cheetah openmpi-bin libopenmpi-dev zlib1g-dev checkinstall
mkdir ~/tmp
cd ~/tmp
tar xvf gamess-current.tar.gz
sudo mv gamess /opt/gamess_cuda
sudo chown $USER:$USER /opt/gamess_cuda -R


ACML
Download both the 'regular' and the int64 gfortran packages from AMD:
http://developer.amd.com/tools-and-sdks/cpu-development/amd-core-math-library-acml/acml-downloads-resources/#download

tar xvf acml-5-3-1-gfortran-64bit-int64.tgz
tar xvf acml-5-3-1-gfortran-64bit.tgz
sh install-acml-5-3-1-gfortran-64bit-int64.sh
Where do you want to install ACML? Press return to use the default location (/opt/acml5.3.1), or enter an alternative path. The directory will be created if it does not already exist. > /opt/acml/acml5.3.1
sh install-acml-5-3-1-gfortran-64bit.sh
Where do you want to install ACML? Press return to use the default location (/opt/acml5.3.1), or enter an alternative path. The directory will be created if it does not already exist. > /opt/acml/acml5.3.1
You'll get something like this:
/opt/acml/acml5.3.1
|-- Doc
|-- gfortran64
|-- gfortran64_fma4
|-- gfortran64_fma4_int64
|-- gfortran64_fma4_mp
|-- gfortran64_fma4_mp_int64
|-- gfortran64_int64
|-- gfortran64_mp
|-- gfortran64_mp_int64
`-- util

where
*  fma4 is for cpus with FMA4 support (use util/cpuid to check)
*  int64 is for double-precision float (integer*8) I think
*  mp is for openmp. For MPI do not use the _mp_ libraries!

Pick your library/ies and add them to the LD_LIBRARY_PATH, e.g.:
echo 'export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/acml/acml5.3.1/gfortran64_int64/lib' >> ~/.bashrc
source ~/.bashrc


CBLAS
cd /opt/netlib/
wget http://www.netlib.org/blas/blast-forum/cblas.tgz
tar xvf cblas.tgz
cd CBLAS/

Edit Makefile.LINUX
24 25 BLLIB = /opt/acml/acml5.3.1/gfortran64_int64/lib/libacml.a 26 CBLIB = ../lib/cblas_$(PLAT).a 27
cp Makefile.LINUX Makefile.in
make

patching libboost
sudo su
cd /usr/include/boost
patch -p1 < /opt/gamess_cuda/libcchem/boot/
exit

Make the following changes by hand if the patch didn't work:

/usr/include/boost/mpl/aux_/integral_wrapper.hpp
47 // other compilers (e.g. MSVC) are not particulary happy about it 48 #if BOOST_WORKAROUND(__EDG_VERSION__, <= 238) || defined(__CUDACC__) 49 typedef struct AUX_WRAPPER_NAME type;
/usr/include/boost/mpl/size_t_fwd.hpp
20 21 BOOST_MPL_AUX_ADL_BARRIER_NAMESPACE_OPEN 22 #if defined(__CUDACC__) 23 typedef std::size_t std_size_t; 24 template< std_size_t N > struct size_t; 25 #else 26 template< std::size_t N > struct size_t; 27 #endif 28 29 BOOST_MPL_AUX_ADL_BARRIER_NAMESPACE_CLOSE
/usr/include/boost/mpl/size_t.hpp
19 #if defined(__CUDACC__) 20 #define AUX_WRAPPER_VALUE_TYPE std_size_t 21 #define AUX_WRAPPER_NAME size_t 22 #define AUX_WRAPPER_PARAMS(N) std_size_t N 23 #else 24 #define AUX_WRAPPER_VALUE_TYPE std::size_t 25 #define AUX_WRAPPER_NAME size_t 26 #define AUX_WRAPPER_PARAMS(N) std::size_t N 27 #endif 28

HDF5
mkdir ~/tmp
cd ~/tmp
wget http://www.hdfgroup.org/ftp/HDF5/current/src/hdf5-1.8.10-patch1.tar.gz
tar xvf hdf5-1.8.10-patch1.tar.gz
cd hdf5-1.8.10-patch1/
export CC=/usr/bin/gcc-4.6 && export CXX=/usr/bin/g++-4.6
./configure --prefix=/opt/gamess_cuda/hdf5 --with-pthread --enable-cxx --enable-threadsafe --enable-unsupported
make
mkdir /opt/gamess_cuda/hdf5/lib -p
mkdir /opt/gamess_cuda/hdf5/include -p
sudo checkinstall
This package will be built according to these values: 0 - Maintainer: [ root@neon ] 1 - Summary: [ hdf5-cxx] 2 - Name: [ hdf5-1.8.10 ] 3 - Version: [ 1.8.10-1 ] 4 - Release: [ 1 ] 5 - License: [ GPL ] 6 - Group: [ checkinstall ] 7 - Architecture: [ amd64 ] 8 - Source location: [ hdf5-1.8.10-patch1 ] 9 - Alternate source location: [ ] 10 - Requires: [ ] 11 - Provides: [ hdf5-1.8.10 ] 12 - Conflicts: [ ] 13 - Replaces: [ ]
Make sure to edit the Version field since Patch-1 leads to an error (must start with digit).

LIBCCHEM
Edit /opt/gamess_cuda/libcchem/src/externals/boost/cuda/device_ptr.hpp and /opt/gamess_cuda/libcchem/rysq/src/externals/boost/cuda/device_ptr.hpp. Insert
#include <stddef.h>
somewhere at the beginning of each file.

./configure --with-gamess --with-hdf5=/opt/gamess_cuda/hdf5 CPPFLAGS="-I/opt/gamess_cuda/hdf5/include" --with-cuda=/usr --disable-openmp --prefix=/opt/gamess_cuda/libcchem --with-gpu=fermi --with-integer8 --with-cublas
make
make install


Configure GAMESS US
cd /opt/gamess_cuda
./config
please enter your target machine name: linux64 GAMESS directory? [/opt/gamess_cuda] GAMESS build directory? [/opt/gamess_cuda] Version? [00] 12 Please enter your choice of FORTRAN: gfortran Please enter only the first decimal place, such as 4.1 or 4.6: 4.6 Enter your choice of 'mkl' or 'atlas' or 'acml' or 'none': acml enter this full pathname: /opt/acml/acml5.3.1 communication library ('sockets' or 'mpi')? mpi Enter MPI library (impi, mvapich2, mpt, sockets): openmpi Please enter your openmpi's location: /opt/openmpi/1.6

Compile
cd ddi/
./compddi
cd ..

Edit comp
872 # see ~/gamess/libcchem/aaa.readme.1st for more information 873 set GPUCODE=true 874 if ($GPUCODE == true) then
and
1663 # -fno-whole-file suppresses argument's data type checking 1664 set OPT='-O0' 1665 if (".$GMS_DEBUG_FLAGS" != .) set OPT="$GMS_DEBUG_FLAGS"
./compall

Edit lked
69 # 70 set GPUCODE=true 71 # 72 # 5. optional MPQC interface
and
958 case openmpi: 959 set MPILIBS="-L$GMS_MPI_PATH/lib" 960 set MPILIBS="$MPILIBS -lmpi -lpthread" 961 breaksw
and
1214 if ($GPUCODE == true) then 1215 echo " Using 'libcchem' add-in C++ codes for Nvidia/CUDA GPUs." 1216 set GPU_LIBS="-L/opt/gamess_cuda/libcchem/lib -lcchem_gamess -lcchem -lrysq" 1217 set GPU_LIBS="$GPU_LIBS -lcudart -lcublas" 1218 ### GPU_LIBS="$GPU_LIBS -lcudart -lcublas" 1219 set GPU_LIBS="$GPU_LIBS /usr/lib/libboost_thread.a" 1220 set GPU_LIBS="$GPU_LIBS /opt/gamess_cuda/hdf5/lib/libhdf5.a" 1221 set GPU_LIBS="$GPU_LIBS /opt/gamess_cuda/hdf5/lib/libhdf5_cpp.a" 1222 set GPU_LIBS="$GPU_LIBS /opt/gamess_cuda/hdf5/lib/libhdf5_hl.a" 1223 set GPU_LIBS="$GPU_LIBS /opt/gamess_cuda/hdf5/lib/libhdf5.a" 1224 set GPU_LIBS="$GPU_LIBS /opt/acml/acml5.3.1/gfortran64_int64/lib/libacml.a /opt/netlib/CBLAS/lib/cblas_LINUX.a" 1225 set GPU_LIBS="$GPU_LIBS -lz" 1226 set GPU_LIBS="$GPU_LIBS -lstdc++" 1227 ### GPU_LIBS="$GPU_LIBS -lgomp" 1228 set GPU_LIBS="$GPU_LIBS -lpthread" 1229 echo " libcchem GPU code's libraries are" 1230 echo "$GPU_LIBS" 1231 else
./lked gamess gpu.12

Run script
Create rungpu:
#!/bin/csh -v set TARGET=mpi set SCR=$HOME/scratch set USERSCR=/scratch set GMSPATH=/opt/gamess_cuda set JOB=$1 set VERNO=$2 set NCPUS=$3 set PPN=$3 @ NUMGPU=1 if ($NUMGPU > 0) then @ NUMCPU = $NCPUS - 1 echo libcchem kernels will use $NUMCPU cores and $NUMGPU GPUs per node... set echo setenv CCHEM_PROFILE 1 setenv NUM_THREADS $NCPUS setenv GPU_DEVICES 0 #--if ($NUMGPU == 0) setenv GPU_DEVICES -1 #--if ($NUMGPU == 2) setenv GPU_DEVICES 0,1 #--if ($NUMGPU == 4) setenv GPU_DEVICES 0,1,2,3 #setenv LD_LIBRARY_PATH /share/apps/cuda/lib64:$LD_LIBRARY_PATH ###### LD_LIBRARY_PATH /usr/local/cuda/lib64:$LD_LIBRARY_PATH unset echo else echo NO GPU setenv GPU_DEVICES -1 endif if ( $JOB:r.inp == $JOB ) set JOB=$JOB:r echo "Copying input file $JOB.inp to your run's scratch directory..." cp $JOB.inp $SCR/$JOB.F05 setenv TRAJECT $USERSCR/$JOB.trj setenv RESTART $USERSCR/$JOB.rst setenv INPUT $SCR/$JOB.F05 setenv PUNCH $USERSCR/$JOB.dat if ( -e $TRAJECT ) rm $TRAJECT if ( -e $PUNCH ) rm $PUNCH if ( -e $RESTART ) rm $RESTART source $GMSPATH/gms-files.csh setenv LD_LIBRARY_PATH /opt/openmpi/1.6/lib:/opt/netlib/CBLAS/lib:/opt/acml/acml5.3.1/gfortran64_int64/lib set path= ( /opt/openmpi/1.6/bin $path ) /opt/openmpi/1.6/bin/mpiexec -n $NCPUS $GMSPATH/gamess.gpu.$VERNO.x|tee $JOB.out cp $PUNCH .
chmod +x it to make it executable.

Add /opt/gamess_cuda to path:
echo 'export PATH=$PATH:/opt/gamess_cuda'
source ~/.bashrc

Testing
cd /opt/gamess_cuda/tests/standard
gpurun exam44 12 2

28 October 2012

268. Compiling and testing GAMESS US on debian testing (wheezy)

Update 3: 9 May 2013. Fixed a couple of mistakes e.g. related to mpi. I also switched from ATLAS to acml -- when I build with ATLAS a lot of the example inputs do not converge.

Update 2: Pietro (see posts below) identified some odd behaviour when running test exam44 in which the scf failed to converge. The (temporary) fix for that has been included in the instructions below (change line 1664 in the file 'comp') -- most likely it's only a single file which needs to be compiled with -O0, but it will take a while to identify which one that is. Having to use -O0 on a performance critical piece of software is obviously unfortunate.

Update:
I've done this on ROCKS 5.4.3/ Centos 5.6 as well. Be aware that because of the ancient version of gfortran (4.1.2) on ROCKS there will be some limitations:

   Alas, your version of gfortran does not support REAL*16,
   so relativistic integrals cannot use quadruple precision.
   Other than this, everything will work properly.
Other than that, follow the instructions below (including editing lked)

Original post:
Solvation energies using implicit solvation is a tough nut to crack. I like working with NWChem, but there's only one solvation model (COSMO) implemented, it has had a history of giving results which are wildly different (>20 kcal/mol! It's fixed now -- using b3lyp/6-311++g** with the cosmo parameters in that post I got 63.68 kcal/mol for Cl-) from that of other software packages (partly due to a bug which was fixed in 2011), and I'm still not sure how to properly use the COSMO module (is rsolv 0 a reasonable value?). Obviously, my own unfamiliarity with the method is another issue, but that's where the idea of sane defaults come in. So, time to test and compare with other models. Reading Cramer, C. J.; Truhlar, D. G. A Acc. Chem. Res. 2008, 41, 760–768 got me interested in GAMESS US again.

Gaussian is not really an attractive option for me anymore for performance reasons (caveat: as seen by me on my particular systems using precompiled binaries). Free (source code + cost) is obviously also always attractive. Being a linux sort of person also plays into it.

So, here's how to get your cluster set up for gamess US:
1. Go to http://www.msg.chem.iastate.edu/GAMES S/download/register/
Select agree, then pick your version -- in my case
GAMESS version May 1, 2012 R1 for 64 bit (x86_64 compatible) under Linux with gnu compilers

Once you've completed your order you're told you may have to wait for up to a week before your registration is approved, but I got approved in less than 24 hours.

[2. Register for GAMESSPLUS at http://comp.chem.umn.edu/license/form-user.html
Again, it may take a little while to get approved -- in my case it was less than 24 hours. Also, it seems that you don't need a separate GAMESSPLUS anymore]

3. Download gamess-current.tar.gz as per the instructions and put it in /opt/gamess (once you've created the folder)

4. If you're using AMD you're in luck -- set up acml on your system. In my case I put everything in /opt/acml/acml5.2.0

I've had bad luck with ATLAS.

5. Compile
sudo apt-get install build-essential gfortran openmpi-bin libopenmpi-dev libboost-all-dev
sudo mkdir /opt/gamess
sudo chown $USER /opt/gamess
cd /opt/gamess
tar xvf gamess-current.tar.gz 
cd gamess/

You're now ready to autoconfigure.

The lengthy autoconfigure.
 Note that
* the location of your openmpi libs may vary -- the debian libs are put in /usr/bin/openmpi/lib by default, but I'm using my own compiled version which I've put in /opt/openmpi
* gamess is linked against the static libraries by default, so if you compiled atlas as is described elsewhere on this blog, you'll be fine.

./config
This script asks a few questions, depending on your computer system,
to set up compiler names, libraries, message passing libraries,
and so forth.
 
You can quit at any time by pressing control-C, and then .
 
Please open a second window by logging into your target machine,
in case this script asks you to 'type' a command to learn something
about your system software situation.  All such extra questions will
use the word 'type' to indicate it is a command for the other window.
 
After the new window is open, please hit  to go on.

   GAMESS can compile on the following 32 bit or 64 bit machines:
axp64    - Alpha chip, native compiler, running Tru64 or Linux
cray-xt  - Cray's massively parallel system, running CNL
hpux32   - HP PA-RISC chips (old models only), running HP-UX
hpux64   - HP Intel or PA-RISC chips, running HP-UX
ibm32    - IBM (old models only), running AIX
ibm64    - IBM, Power3 chip or newer, running AIX or Linux
ibm64-sp - IBM SP parallel system, running AIX
ibm-bg   - IBM Blue Gene (P or L model), these are 32 bit systems
linux32  - Linux (any 32 bit distribution), for x86 (old systems only)
linux64  - Linux (any 64 bit distribution), for x86_64 or ia64 chips
           AMD/Intel chip Linux machines are sold by many companies
mac32    - Apple Mac, any chip, running OS X 10.4 or older
mac64    - Apple Mac, any chip, running OS X 10.5 or newer
sgi32    - Silicon Graphics Inc., MIPS chip only, running Irix
sgi64    - Silicon Graphics Inc., MIPS chip only, running Irix
sun32    - Sun ultraSPARC chips (old models only), running Solaris
sun64    - Sun ultraSPARC or Opteron chips, running Solaris
win32    - Windows 32-bit (Windows XP, Vista, 7, Compute Cluster, HPC Edition)
win64    - Windows 64-bit (Windows XP, Vista, 7, Compute Cluster, HPC Edition)
winazure - Windows Azure Cloud Platform running Windows 64-bit
    type 'uname -a' to partially clarify your computer's flavor.
please enter your target machine name: linux64

Where is the GAMESS software on your system?
A typical response might be /u1/mike/gamess,
most probably the correct answer is /home/me/tmp/gamess
 
GAMESS directory? [/opt/gamess] /opt/gamess

Setting up GAMESS compile and link for GMS_TARGET=linux64
GAMESS software is located at GMS_PATH=/home/me/tmp/gamess
 
Please provide the name of the build locaation.
This may be the same location as the GAMESS directory.
 
GAMESS build directory? [/opt/gamess] /opt/gamess

Please provide a version number for the GAMESS executable.
This will be used as the middle part of the binary's name,
for example: gamess.00.x

Version? [00] 12r1

Linux offers many choices for FORTRAN compilers, including the GNU
compiler set ('g77' in old versions of Linux, or 'gfortran' in
current versions), which are included for free in Unix distributions.
 
There are also commercial compilers, namely Intel's 'ifort',
Portland Group's 'pgfortran', and Pathscale's 'pathf90'.  The last
two are not common, and aren't as well tested as the others.
 
type 'rpm -aq | grep gcc' to check on all GNU compilers, including gcc
type 'which gfortran'  to look for GNU's gfortran (a very good choice),
type 'which g77'       to look for GNU's g77,
type 'which ifort'     to look for Intel's compiler,
type 'which pgfortran' to look for Portland Group's compiler,
type 'which pathf90'   to look for Pathscale's compiler.
Please enter your choice of FORTRAN: gfortran

gfortran is very robust, so this is a wise choice.

Please type 'gfortran -dumpversion' or else 'gfortran -v' to
detect the version number of your gfortran.
This reply should be a string with at least two decimal points,
such as 4.1.2 or 4.6.1, or maybe even 4.4.2-12.
The reply may be labeled as a 'gcc' version,
but it is really your gfortran version.
Please enter only the first decimal place, such as 4.1 or 4.6:  
4.6

   Good, the newest gfortran can compile REAL*16 data type.
hit <return> to continue to the math library setup.

Linux distributions do not include a standard math library.
 
There are several reasonable add-on library choices,
       MKL from Intel           for 32 or 64 bit Linux (very fast)
      ACML from AMD             for 32 or 64 bit Linux (free)
     ATLAS from www.rpmfind.net for 32 or 64 bit Linux (free)
and one very unreasonable option, namely 'none', which will use
some slow FORTRAN routines supplied with GAMESS.  Choosing 'none'
will run MP2 jobs 2x slower, or CCSD(T) jobs 5x slower.
 
Some typical places (but not the only ones) to find math libraries are
Type 'ls /opt/intel/mkl'                 to look for MKL
Type 'ls /opt/intel/Compiler/mkl'        to look for MKL
Type 'ls /opt/intel/composerxe/mkl'      to look for MKL
Type 'ls -d /opt/acml*'                  to look for ACML
Type 'ls -d /usr/local/acml*'            to look for ACML
Type 'ls /usr/lib64/atlas'               to look for Atlas
 
Enter your choice of 'mkl' or 'atlas' or 'acml' or 'none': acml

Type 'ls -d /opt/acml*' or 'ls -d /usr/local/acml*'
and note the the full path, which includes a version number.
enter this full pathname: /opt/acml/acml5.2.0
Math library 'acml' will be taken from /opt/acml/acml5.2.0/gfortran64_int64/lib
please hit <return> to compile the GAMESS source code activator
gfortran -o /home/me/tmp/gamess/build/tools/actvte.x actvte.f
unset echo
Source code activator was successfully compiled.
 
please hit  to set up your network for Linux clusters.

If you have a slow network, like Gigabit Ethernet (GE), or
if you have so few nodes you won't run extensively in parallel, or
if you have no MPI library installed, or
if you want a fail-safe compile/link and easy execution,
     choose 'sockets'
to use good old reliable standard TCP/IP networking.
 
If you have an expensive but fast network like Infiniband (IB), and
if you have an MPI library correctly installed,
     choose 'mpi'.
 
communication library ('sockets' or 'mpi')? mpi

The MPI libraries which work well on linux64/Infiniband are
      Intel's MPI (impi)
      MVAPICH2
      SGI's mpt from ProPack, on Altix/ICE systems
Other libraries may work, please see 'readme.ddi' for info.
The choices listed above will compile and link easily,
and are known to run correctly and efficiently.

Enter 'sockets' if you just changed your mind about trying MPI.

Enter MPI library (impi, mvapich2, mpt, sockets): openmpi
MPI can be installed in many places, so let's find openmpi.
The person who installed your MPI can tell you where it really is.
 
impi     is probably located at a directory like
              /opt/intel/impi/3.2
              /opt/intel/impi/4.0.1.007
              /opt/intel/impi/4.0.2.003
         include iMPI's version numbers in your reply below.
mvapich2 could be almost anywhere, perhaps some directory like
              /usr/mpi/gcc/mvapich2-1.6
openmpi  could be almost anywhere, perhaps some directory like
              /usr/mpi/openmpi-1.4.3
mpt      is probably located at a directory like
              /opt/sgi/mpt/mpt-1.26
Please enter your openmpi's location: /opt/openmpi/1.6

Your configuration for GAMESS compilation is now in
     /home/me/tmp/gamess/build/install.info
Now, please follow the directions in
     /opt/gamess/machines/readme.unix



I next did this:

cd /opt/gamess/ddi
./compddi
cd ../

Edit the file 'comp' and change it from

1664       set OPT='-O2'
to
1664       set OPT='-O0'

or test case exam44.inp in tests/standard will fail due to lack of SCF convergence. (I've tried -O1 as well with no luck)

Continue your compilation:
./compall

Running 'compall' reads "install.info" which I include below:
#!/bin/csh
#   compilation configuration for GAMESS
#   generated on beryllium
#   generated at Friday 21 September  08:48:09 EST 2012
setenv GMS_PATH            /opt/gamess
setenv GMS_BUILD_DIR       /opt/gamess
#         machine type
setenv GMS_TARGET          linux64
#         FORTRAN compiler setup
setenv GMS_FORTRAN         gfortran
setenv GMS_GFORTRAN_VERNO  4.6
#         mathematical library setup
setenv GMS_MATHLIB         acml
setenv GMS_MATHLIB_PATH    /opt/acml/acml5.2.0//gfortran64_int64/lib
#         parallel message passing model setup
setenv GMS_DDI_COMM        mpi
setenv GMS_MPI_LIB         openmpi
setenv GMS_MPI_PATH        /opt/openmpi/1.6

Note that you can't change the gfortran version here either -- 4.7 won't be recognised.

Anyway, compilation will take a while -- enough for some coffee and reading.

In the next step you may have problems with openmpi -- lked looks in e.g. /opt/openmpi/1.6/lib64 but you'll probably only have /opt/openmpi/1.6/lib

Edit lked and change
 958             case openmpi:
 959                set MPILIBS="-L$GMS_MPI_PATH/lib64"
 960                set MPILIBS="$MPILIBS -lmpi"
 961                breaksw
to
 958             case openmpi:
 959                set MPILIBS="-L$GMS_MPI_PATH/lib"
 960                set MPILIBS="$MPILIBS -lmpi -lpthread"
 961                breaksw



Generate the runtime file:
./lked gamess 12r1 >&  lked.log

Done!


To compile with openblas:
1. edit install.info
#!/bin/csh
#   compilation configuration for GAMESS
#   generated on tantalum
#   generated at Friday 21 September  14:01:54 EST 2012
setenv GMS_PATH            /opt/gamess
setenv GMS_BUILD_DIR       /opt/gamess
#         machine type
setenv GMS_TARGET          linux64
#         FORTRAN compiler setup
setenv GMS_FORTRAN         gfortran
setenv GMS_GFORTRAN_VERNO  4.6
#         mathematical library setup
setenv GMS_MATHLIB         openblas
setenv GMS_MATHLIB_PATH    /opt/openblas/lib
#         parallel message passing model setup
setenv GMS_DDI_COMM        mpi
setenv GMS_MPI_LIB         openmpi
setenv GMS_MPI_PATH        /opt/openmpi/1.6

2. edit lked
Add lines 462-466 which sets up the openblas switch.

 453       endif
 454       set BLAS=' '
 455       breaksw
 456 
 457    case acml:
 458       #     do a static link so that only compile node needs to install ACML
 459       set MATHLIBS="$GMS_MATHLIB_PATH/libacml.a"
 460       set BLAS=' '
 461       breaksw
 462 case openblas:
 463        #     do a static link so that only compile node needs to install openblas
 464        set MATHLIBS="$GMS_MATHLIB_PATH/libopenblas.a"
 465        set BLAS=' '
 466        breaksw
 467 
 468    case none:
 469    default:
 470       echo "Warning.  No math library was found, you should install one."
 471       echo "    MP2 calculations speed up about 2x with a math library."
 472       echo "CCSD(T) calculations speed up about 5x with a math library."
 473       set BLAS='blas.o'
 474       set MATHLIBS=' '
 475       breaksw

3. Link
./lked gamess 12r2 >&  lked.log

You now have gamess.12r1.x which uses ATLAS, and gamess.12r2.x which uses openblas.

To run:
The rungms file was a bit too 'clever' for me, so I boiled it down to a file called gmrun which made executable (chmod +X gmrun) and put in /opt/gamess:

#!/bin/csh
set TARGET=mpi
set SCR=$HOME/scratch
set USERSCR=/scratch
set GMSPATH=/opt/gamess
set JOB=$1
set VERNO=$2
set NCPUS=$3

if ( $JOB:r.inp == $JOB ) set JOB=$JOB:r
echo "Copying input file $JOB.inp to your run's scratch directory..."
cp $JOB.inp $SCR/$JOB.F05

setenv TRAJECT $USERSCR/$JOB.trj
setenv RESTART $USERSCR/$JOB.rst
setenv INPUT $SCR/$JOB.F05
setenv PUNCH $USERSCR/$JOB.dat
if ( -e $TRAJECT ) rm $TRAJECT
if ( -e  $PUNCH ) rm $PUNCH
if ( -e  $RESTART ) rm $RESTART
source $GMSPATH/gms-files.csh

setenv LD_LIBRARY_PATH /opt/openmpi/1.6/lib:$LD_LIBRARY_PATH
set path= ( /opt/openmpi/1.6/bin $path )
/opt/openmpi/1.6/bin/mpiexec -n $NCPUS $GMSPATH/gamess.$VERNO.x|tee $JOB.out
cp $PUNCH .

Note that I actually do have two scratch directories -- one ~/scratch and one in /scratch. Note that the SCR directory should be local to the node as well as spacious, while USERSCR can be a networked,smaller directory.

Finally do
echo 'export PATH=$PATH:/opt/gamess' >> ~/.bashrc

Anyway.
Navigate to your tests/standard folder where there's a lot of exam*.inp files and do

gmrun exam12 12r1 4

where exam01 (or exam01.inp) is the name of the input file, 12r1 is the version number (that you set above) and 4 is the number of processors/threads .
 
          ---------------------
          ELECTROSTATIC MOMENTS
          ---------------------

 POINT   1           X           Y           Z (BOHR)    CHARGE
                 0.000000   -0.000000    0.000000       -0.00 (A.U.)
         DX          DY          DZ         /D/  (DEBYE)
    -0.000000    0.000000   -0.000000    0.000000
 ...... END OF PROPERTY EVALUATION ......
 CPU     0: STEP CPU TIME=     0.02 TOTAL CPU TIME=        2.2 (    0.0 MIN)
 TOTAL WALL CLOCK TIME=        2.3 SECONDS, CPU UTILIZATION IS  97.78%
  $VIB   
          IVIB=   0 IATOM=   0 ICOORD=   0 E=      -76.5841347569
 -6.175208802E-40-6.175208802E-40-4.411868660E-07 6.175208802E-40 6.175208802E-40
  4.411868660E-07-1.441225933E-40-1.441225933E-40 1.672333111E-06 1.441225933E-40
  1.441225933E-40-1.672333111E-06
 -4.053383177E-34 4.053383177E-34-2.257541709E-15
 ......END OF GEOMETRY SEARCH......
 CPU     0: STEP CPU TIME=     0.00 TOTAL CPU TIME=        2.2 (    0.0 MIN)
 TOTAL WALL CLOCK TIME=        2.3 SECONDS, CPU UTILIZATION IS  97.35%
               990473  WORDS OF DYNAMIC MEMORY USED
 EXECUTION OF GAMESS TERMINATED NORMALLY Fri Sep 21 14:27:17 2012
 DDI: 263624 bytes (0.3 MB / 0 MWords) used by master data server.

 ----------------------------------------
 CPU timing information for all processes
 ========================================
 0: 2.160 + 0.44 = 2.204
 1: 2.220 + 0.20 = 2.240
 2: 2.212 + 0.32 = 2.244
 3: 4.240 + 0.04 = 4.244
 4: 4.260 + 0.00 = 4.260
 5: 4.256 + 0.08 = 4.264
 ----------------------------------------


Done!


Looking at another test case (acetate w/ cosmo) I get the following scaling on a single node as a function of processors:


shmmax issue:
Anyone who has been using nwchem will be familiar with this
 INPUT CARD> $END                                                                           
 DDI Process 0: shmget returned an error.
 Error EINVAL: Attempting to create 160525768 bytes of shared memory.
 Check system limits on the size of SysV shared memory segments.

 The file ~/gamess/ddi/readme.ddi contains information on how to display
 the current SystemV memory settings, and how to increase their sizes.
 Increasing the setting requires the root password, and usually a sytem reboot.

 DDI Process 0: error code 911

The fix is the same. First do

cat /proc/sys/kernel/shmmax

and look at the value. Then set it to the desired value according to this post: http://verahill.blogspot.com.au/2012/04/solution-to-nwchem-shmmax-too-small.html
e.g.
sudo sysctl -w kernel.shmmax=6269961216

gfortran version issue:
Even though you likely have version 4.7.x of gfortran, pick 4.6 or you will get:

Please type 'gfortran -dumpversion' or else 'gfortran -v' to
detect the version number of your gfortran.
This reply should be a string with at least two decimal points,
such as 4.1.2 or 4.6.1, or maybe even 4.4.2-12.
The reply may be labeled as a 'gcc' version,
but it is really your gfortran version.
Please enter only the first decimal place, such as 4.1 or 4.6:  
4.7

The gfortran version number is not recognized.
It should only have one decimal place, such as 4.x

The reason is this (code from config):
      switch ($GMS_GFORTRAN_VERNO)
         case 4.1:
         case 4.2:
         case 4.3:
         case 4.4:
         case 4.5:
            echo "   Alas, your version of gfortran does not support REAL*16,"
            echo "   so relativistic integrals cannot use quadruple precision."
            echo "   Other than this, everything will work properly."
            breaksw
         case 4.6:
            echo "   Good, the newest gfortran can compile REAL*16 data type."
            breaksw
         default:
            echo "The gfortran version number is not recognized."
            echo "It should only have one decimal place, such as 4.x"
            exit 4
            breaksw
      endsw

20 September 2012

243. My own personal benchmarks for NWChem, gromacs with atlas, openblas, acml on AMD and intel

Update: you can compile against acml on intel as well, and against mkl on amd. Still need to do some performance testing to see how well it works. The artificial penalty of running mkl on AMD is well-publicised and led to a lawsuit, but I don't know how acml performs on mkl.


The title says it all, really. Since I'm back to exploring ways of improving performance for my little cluster I figured I'd break this out as a separate post. Most of this data was found here before: http://verahill.blogspot.com.au/2012/09/new-compute-node-using-amd-fx-8150.html

All units are running up-to-date debian testing (wheezy).

Configuration:
Boron (B): Phenom II X6 2.8 GHz, 8Gb RAM (2.8*6=16.8 GFLOPS predicted)
Neon (Ne): FX-8150 X8 3.6 GHz, 16 Gb RAM (3.6*8=28.8 GFLOPS predicted (int), 3.6*4=14.4 GFLOPS (fpu))
Tantalum (Ta): Quadcore i5-2400 3.1 GHz, 8 Gb RAM (3.1*4=12.4 GFLOPS predicted)
Vanadium (V):  Dual socket 2x Quadcore Xeon X3480 3.06 GHz, 8Gb. CentOS (ROCKS 5.4.3)/openblas.

Results

Gromacs --double (1 ns 6x6x6 nm tip4p water box; dynamic load balancing, double precision, 500k steps)
B  :  10.662 ns/day (11.8  GFLOPS, runtime 8104 seconds)***
B  :    9.921 ns/day ( 10.9 GFLOPS, runtime 8709 seconds)**
Ne:  10.606 ns/day (11.7  GFLOPS, runtime 8146 seconds) *
Ne:  12.375 ns/day (13.7  GFLOPS, runtime 6982 seconds)**
Ne:  12.385 ns/day (13.7  GFLOPS, runtime 6976 seconds)****
Ta:  10.825 ns/day (11.9  GFLOPS, runtime 7981 seconds)***
V :   10.560 ns/dat (11.7  GFLOPS, runtime 8182 seconds)***
*no external blas/lapack.
**using ACML libs
*** using openblas
**** using ATLAS

Gromacs --single (1 ns 6x6x6 nm tip4p water box; dynamic load balancing, single precision, 500 k steps)
B  :   17.251 ns/day (19.0 GFLOPS, runtime 5008 seconds)***
Ne:   21.874 ns/day (24.2 GFLOPS, runtime  3950 seconds)**
Ne:   21.804 ns/day (24.1 GFLOPS, runtime 3963  seconds)****
Ta:   17.345 ns/day (19.2 GFLOPS, runtime  4982 seconds)***
V :   17.297 ns/day (19.1 GFLOPS, runtime 4995 seconds)***
*no external blas/lapack.
**using ACML libs
*** using openblas
**** using ATLAS

NWChem (opt biphenyl cation, cp-md/pspw):
B  :   5951 seconds**
B  :   4084 seconds ***
B  :   5782 seconds ***xy
Ne:    3689 seconds**
Ta :   4102 seconds***
Ta :   4230 seconds***xy
V :    5396 seconds***

*no external blas/lapack.
**using ACML libs
*** using openblas
x Reconfigured using getmem.nwchem

NWChem (opt biphenyl cation, geovib, 6-31G**/ub3lyp):
B  :  2841 seconds **
B  :  2410 seconds***
B  :  2101 seconds ***x
B  :  2196 seconds ***xy
Ne: 1665 seconds **
Ta : 1785 seconds***
Ta : 1789 seconds***xy
V  : 2600 seconds***

*no external blas/lapack.
**using ACML libs
*** using openblas
x Reconfigured using getmem.nwchem
y NWChem 6.1.1

A Certain Commercial Ab Initio Package (Freq calc of pre-optimised H14C19O3 at 6-31+G*/rb3lyp):
B  :    2h 00 min (CPU time 10h 37 min)
Ne:   1h 37 min (CPU time: 11h 13 min)
Ta:   1h 26 min (CPU time: 5h 27 min)
V  :   2h 15 min (CPU time 15h 50 min)
Using precompiled binaries.


Gamess:
(I'm still working on learning how to run gamess efficiently, so take these values with a huge saucer of salt for now). bn.inp does a geometry optimisation of a biphenyl cation (mult 2) at ub3lyp/6-31G**. bn.inp has no $STATPT card while bn3.inp does and it makes a huge difference -- but is this because it does 20 steps (nsteps=20), then kills the run? The default is 50 steps and it does seem like all the runs do the maximum number of steps, then exit.

 Again, still learning. See below for input files. Will fix this post as I learn what the heck I'm doing. The relative run times on each node are still comparable though, but just don't use the numbers to compare the run speed of e.g. nwchem vs gamess.

Gamess using bn.inp with atlas
B:    9079 seconds
Ne: 7252 seconds
Ta:  9283 seconds

Gamess using bn.inp with openblas
B:   9071 seconds
Ta: 9297 seconds

Gamess using bn.inp with acml
Ne: 7062 seconds

Gamess using bn3.inp with atlas. 
B: 4016 seconds
Ne: 3162 seconds
Ta: 4114 seconds

MPQC:
Here I've used the version in the debian repos. I've created a hostfile
neon slots=8 max_slots=8
tantalum slots=4 max_slots=4
boron slots=6 max_slots=6

and then just looked at changing the order and slots assignment as well as total number of cores assigned using mpirun.

Simple test case looking at number of cores/distribution:
n cores:  Seconds: Configuration(cores,exec nodes)
4    :   11   : 4(Ta)
4    :   17   : 4(Ne)
4    :   17   : 4(B)
4    :   42   : 2(Ta)+2(B)
6    :   12   : 6(B)
6    :   13   : 6(Ne)
6    :   74   : 2(Ta)+2(B)+2(Ne)
8    :   12   : 8(Ne)
10  :   43   : 4(Ta)+6(B)
12  :   47   : 4(Ta)+8(Ne)
14  :   55   : 6(B)+8(Ne)
18  :   170 :  4(Ta)+6(B)+8(Ne)

My beowulf cluster doesn't seem to be much of a super computer. All in all, this looks like a pretty good argument in favour of upgrading to infiniband...


bn.inp:
 $CONTRL 
COORD=CART UNITS=ANGS scftyp=uhf dfttyp=b3lyp runtyp=optimize 
ICHARG=1 MULT=2 maxit=100
$END
 $system mwords=2000 $end
 $BASIS gbasis=n31 ngauss=6 ndfunc=1 npfunc=1 $END
 $guess guess=huckel $end

 $DATA
biphenyl
C1
C      6.0      0.0000000000   -3.5630100000    0.0000000000 
C      6.0     -1.1392700000   -2.8592800000   -0.3938400000 
C      6.0     -1.1387900000   -1.4654500000   -0.3941500000 
C      6.0      0.0000000000   -0.7428100000    0.0000000000 
C      6.0      1.1387900000   -1.4654500000    0.3941500000 
C      6.0      1.1392700000   -2.8592800000    0.3938400000 
C      6.0      0.0000000000    0.7428100000    0.0000000000 
C      6.0      1.1387900000    1.4654500000   -0.3941500000 
C      6.0      1.1392700000    2.8592800000   -0.3938400000 
C      6.0     -1.1387900000    1.4654500000    0.3941500000 
C      6.0      0.0000000000    3.5630100000    0.0000000000 
C      6.0     -1.1392700000    2.8592800000    0.3938400000 
H      1.0      0.0000000000   -4.6489600000    0.0000000000 
H      1.0     -2.0282700000   -3.3966200000   -0.7116100000 
H      1.0     -2.0214800000   -0.9282700000   -0.7279300000 
H      1.0      2.0282700000   -3.3966200000    0.7116100000 
H      1.0      2.0282700000    3.3966200000   -0.7116100000 
H      1.0     -2.0214800000    0.9282700000    0.7279300000 
H      1.0      0.0000000000    4.6489600000    0.0000000000 
H      1.0     -2.0282700000    3.3966200000    0.7116100000 
H      1.0      2.0214800000    0.9282700000   -0.7279300000 
H      1.0      2.0214800000   -0.9282700000    0.7279300000 
 $END


bn3.inp:
$CONTRL 
COORD=CART UNITS=ANGS scftyp=uhf dfttyp=b3lyp runtyp=optimize 
ICHARG=1 MULT=2 maxit=100
$END
 $system mwords=2000 $end
 $BASIS gbasis=n31 ngauss=6 ndfunc=1 npfunc=1 $END
 $STATPT OPTTOL=0.0001 NSTEP=20 HSSEND=.TRUE. $END
 $guess guess=huckel $end

 $DATA
biphenyl
C1
C      6.0      0.0000000000   -3.5630100000    0.0000000000 
C      6.0     -1.1392700000   -2.8592800000   -0.3938400000 
C      6.0     -1.1387900000   -1.4654500000   -0.3941500000 
C      6.0      0.0000000000   -0.7428100000    0.0000000000 
C      6.0      1.1387900000   -1.4654500000    0.3941500000 
C      6.0      1.1392700000   -2.8592800000    0.3938400000 
C      6.0      0.0000000000    0.7428100000    0.0000000000 
C      6.0      1.1387900000    1.4654500000   -0.3941500000 
C      6.0      1.1392700000    2.8592800000   -0.3938400000 
C      6.0     -1.1387900000    1.4654500000    0.3941500000 
C      6.0      0.0000000000    3.5630100000    0.0000000000 
C      6.0     -1.1392700000    2.8592800000    0.3938400000 
H      1.0      0.0000000000   -4.6489600000    0.0000000000 
H      1.0     -2.0282700000   -3.3966200000   -0.7116100000 
H      1.0     -2.0214800000   -0.9282700000   -0.7279300000 
H      1.0      2.0282700000   -3.3966200000    0.7116100000 
H      1.0      2.0282700000    3.3966200000   -0.7116100000 
H      1.0     -2.0214800000    0.9282700000    0.7279300000 
H      1.0      0.0000000000    4.6489600000    0.0000000000 
H      1.0     -2.0282700000    3.3966200000    0.7116100000 
H      1.0      2.0214800000    0.9282700000   -0.7279300000 
H      1.0      2.0214800000   -0.9282700000    0.7279300000 
 $END