Showing posts with label frequency. Show all posts
Showing posts with label frequency. Show all posts

15 May 2013

414. Frequency vs cores? Crude benchmarking on AMD FX 8150

I'm thinking about building my next computational node, and one issue which is preoccupying me is whether to go for lots of cores (e.g. a dual sock mobo with two 16 core 2.1 GHz cpus) or for a balance of cores and frequency (e.g. single-socket mobo with a 3.8 GHz 8 core cpu). Remember, this is built with private money -- not research grants -- so the budget is tight.

I mean, I can't look at something like this without wanting to buy it: http://www.newegg.com/Product/Product.aspx?Item=N82E16819113036. The question is whether I'm better off buying another one or two fx8150 for the price of 16x2 down-clocked cores.

Benchmarking with the FX 8150 actually makes some sense here if one of the newegg reviewers is to be believed, since the Opteron 6272 is described as two 8150s glued together and down-clocked.

The system: 32 gb ram, fx 8150, nwchem 6.1.1 with acml 5.3.1 (gfortran,int64, fma4) and openmpi.

Short of finding benchmarks for the type of applications that interest me (nwchem, mostly), I figure I could get a rough idea by throttling the frequency of my eight-core FX8150 and compare with unthrottled runs where the number of cores is limited.

Two things to take into account when looking at the times below:
  • modern processors are complex beasts -- I don't claim to fully understand threads vs virtual threads and integer vs FPU. In the FX8150 there are four fpus but eight cores. What this really means in practical terms when doing these particular test calculations, I don't know.
  • This isn't my job, and I need my nodes for running job-related calcs, so by necessity I had to use a short test job. There's inevitably some variability in the results, and using longer test jobs might affect the results somewhat.
  • The execution times vary A LOT for 'identical' conditions (see raw data), hence why I repeated the runs in bold ten times at 3.6 GHz to get reasonably solid comparison values. Still not perfect since the distribution isn't properly gaussian.

The specific question I wanted answered is:
Are 8 threads at 2.1 GHz significantly better than 4 threads at 3.6 GHz?
Short answer: No.
Looks like I won't be investing in 2 x 16 core 2.1 GHz cpus after all.


Optimization
c/f     3.60    3.30    2.70    2.10    1.40
8       44/3    49/6    58/1    75/6    110/5  
7       48/3                     72
6       52/1                    106
5       59/4            85       97
4       67/8            93     113/10    156
3       85/7
2      117/10
1      237/24
c=number of cores; f= frequency in GHz.

(times in seconds. 44/3 means 44 s +/- 3 s)

The way I read this is that it's better to have a 4-core 3.6 GHz cpu than an 8-core 2.1 GHz CPU. The whole 4 FPU/8 cores has me confused though, so I'm not sure whether that's affecting the results in a significant way.

The other thing to take into account is that there isn't normally a linear relationship between number of cores and execution times anyway -- doubling the number of cores doesn't normally lead to a halving of the execution time, so 16 cores at 2.10 GHz wouldn't necessarily be anywhere near 75/2=37 s. (again, that's ignoring the 2 cores/1 fpu issue)

-------------
c/f: raw data
--------------
8/3.6: 37.7,47.4,46.9,38.8, 46.8, 42.4,46.6, 43.9,44.7,42.8 => 44+/-3 s
7/3.6: 41.3,48.7,47.9,48.8,47.0,48.8,50.8,42.4,52.1,47.9 => 48+/-3 s
6/3.6: 49.5,53.4,50.5,53.4,52.4,53.3,51.3,53.4,52.5,53.55 => 52+/-1 s
5/3.6: 54.1,57.1, 67.7,52.2,59.6,58.4,59.8,57.6,59.4,58.6 => 59+/-4 s
4/3.6: 83.1,63.5,73.7,70.0,68.6,58.1,58.1,67.2,69.9,58.2 => 67 +/-8 s
3/3.6: 89.5, 86.0, 82.8, 97.9, 74.4,86.2,89.7, 86.3, 74.5, 86.2 => 85 +/-7 s
2/3.6: 114.1,137.4, 118.6, 108.3, 116.3, 123.6, 104.4,124.3,104.7, 120.6 => 117+/-10 s
1/3.6: 242.6,201.9,232.9,242.7, 233.2,202.0,233.1,265.2, 278.9,233.5 => 237+/- 24
8/3.3: 51.9, 42.4,42.7,55.3,43.3,55.8,54.6,48.1,42.4,48.1 => 49+/-6 s
8/2.7: 59.4, 57.3,59.1,57.8,58.9,56.8,59.0,58.5,59.2,56.9 => 58+/-1
8/2.1: 75.6,82.9,73.7,65.1,76.9,84.3,65.4,73.9,76.4,78.1 => 75+/-6 s
8/1.4: 112.5,110.5,112.1,108.6,113.1,114.4,112.4,109.1,97.9 => 110+/-5
4/2.1: 124.9,103.7,104.1, 92.4, 117.6,115.5,117.5,120.1,115.6,120.2 => 113+/-10 s

An alternative would be to report the fastest time (out of e.g. 10 tries) since it represents maximum capacity.



optimization input
scratch_dir /scratch
start benzeneopt 

geometry units angstroms
C  0.100  1.396  0.000
C  1.209  0.698  0.000
C  1.209 -0.698  0.000
C  0.000 -1.396  0.000
C -1.209 -0.698  0.000
C -1.209  0.698  0.000
H  0.000  2.479  0.000
H  2.147  1.240  0.000
H  2.147 -1.240  0.000
H  0.000 -2.479  0.000
H -2.147 -1.240  0.000
H -2.147  1.240  0.000
end

basis
 H library "6-31+g*" 
 c library "6-31+g*"
end
dft
 direct
end

task dft optimize



Setting frequency
The following script was called with the frequency in GHz, e.g. sudo setfreq 3.6

setfreq
/usr/bin/cpufreq-set -c 0 -g userspace
/usr/bin/cpufreq-set -c 1 -g userspace
/usr/bin/cpufreq-set -c 2 -g userspace
/usr/bin/cpufreq-set -c 3 -g userspace
/usr/bin/cpufreq-set -c 4 -g userspace
/usr/bin/cpufreq-set -c 5 -g userspace
/usr/bin/cpufreq-set -c 6 -g userspace
/usr/bin/cpufreq-set -c 7 -g userspace
/usr/bin/cpufreq-set -c 0 -f $1G
/usr/bin/cpufreq-set -c 1 -f $1G
/usr/bin/cpufreq-set -c 2 -f $1G
/usr/bin/cpufreq-set -c 3 -f $1G
/usr/bin/cpufreq-set -c 4 -f $1G
/usr/bin/cpufreq-set -c 5 -f $1G
/usr/bin/cpufreq-set -c 6 -f $1G
/usr/bin/cpufreq-set -c 7 -f $1G

25 June 2012

200. How long will your nwchem frequency calc take?

Update 19/12/12: Having done a lot more frequency calculations since I posted this I sincerely doubt that this approach works.

Original post
Because I'm stuck waiting for the results of frequency calcs on some large transition metal clusters, I've become interested in understanding the output of frequency calculations in progress. After all, why wait 15 days for a results if there are early signs that the calculation has gone haywire?

Also, it might just be me, but frequency calculations are not that easy to restart, so you want to make sure that you give them enough wall time to finish if you use a queue manager.

I'm sure most of this could be appreciated by RTFM, but who has time for that?

So this is what the calc does:
After the usual boredom of reading in the geometry and doing an energy calculation, followed by an MO analysis, the computational fun starts.

Each cycle contains the following reports:


  1. Total Density - Mulliken Population Analysis
  2. Spin Density - Mulliken Population Analysis
  3. Total Density - Lowdin Population Analysis
  4. Spin Density - Lowdin Population Analysis
  5. Expectation value of S2:  
  6. NWChem DFT Module
  7.   Caching 1-el integrals 
  8.       Total Density - Mulliken Population Analysis
  9.       Spin Density - Mulliken Population Analysis
with the exception of the first cycle, which also look at the alpha-beta orbital overlaps, the centre of mass, moments of inertia and does a multipole analysis of density and save an initial hessian.





Each cycle ends with a report of the energy for that vibration:


         Total DFT energy =    -3297.032399945703
      One electron energy =   -26618.764098759657
           Coulomb energy =    12938.745973154924
    Exchange-Corr. energy =     -382.742230868704
 Nuclear repulsion energy =    10765.727956527733

 Numeric. integr. density =      441.999974968347

     Total iterative time =   7947.4s

If you do cat nwch.nwout|egrep "Total iterative time|Total DFT energy" you can see the progress:
        Total DFT energy =    -3297.032416366805
     Total iterative time =  12146.0s
         Total DFT energy =    -3297.032399945703
     Total iterative time =   7947.4s
         Total DFT energy =    -3297.032399544749
     Total iterative time =   7946.0s
         Total DFT energy =    -3297.032406934719
     Total iterative time =   7945.8s
         Total DFT energy =    -3297.032405026814

You now have an idea of how long each step takes. But how many steps in total? I think it's 3N steps, where N is the number of atoms.

For my 50 atoms POM using the values above it'd be roughly 8000 s * 150 = 13 days 22 hours.

Which seems about right...

cat nwch.nwout|grep 'Total DFT'|gawk 'END {print NR}'
66
so I've got another 8 days before I can get my hand on some juicy thermochemical data...

Time to start preparing lectures...




19 June 2012

195. Frequency calcs in NWChem

It's no secret that I'm a computational 'noob'. As such as I'm learning both by reading and by doing.

The doing part consists of checking 1) what the time penalty for different methods is and 2) what the accuracy/differences between different methods are.

Again, these are short calculations for simple molecules. Longer calculations with more exciting features (unpaired electrons, closely spaced MOs, highly negative charges etc.) may well behave completely different.

Today's focus is vibrational calcs.

Test Molecule: CHClF(OH) (chloro-fluoro-methanol)
  1 Title "Freq_test"
  2 
  3 Start  Freq_test
  4 
  5 echo
  6 
  7 charge 0
  8 
  9 geometry noautosym units angstrom
 10  C     0.0416942     -0.501783     0.399137
 11  H     0.0442651     -0.499048     1.48122
 12  O     1.21393     -1.00985     -0.0746688
 13  H     1.25125     -0.957351     -1.06923
 14  F     -1.08480     -1.08768     -0.134571
 15  Cl     -0.120345     1.41214     -0.0717951
 16 end
 17 
 18 ecce_print ecce.out
 19 
 20 basis "ao basis" cartesian print
 21   H library "3-21G"
 22   F library "3-21G"
 23   Cl library "3-21G"
 24   O library "3-21G"
 25   C library "3-21G"
 26 END
 27 
 28 dft
 29   mult 1
 30   odft
 31   mulliken
 32 end
 33 
 34 task dft energy
 35 task dft freq

All geometries were optimised in the gas phase using 3-21G.

0. Some useful statements:
hessian      print "hess_follow"
                 profile
end
1. Basis set (geometry optimised in 3-21g)
(time/enthalpy/entropy/scfe)
3-21G:              81s    24.984 kcal/mol    69.235 cal/mol-K   -671.17956992206 Hartree
6-31G:            105s    21.885 kcal/mol    68.793 cal/mol-K   -674.478768966106
6-31++G**:    399s   21.734 kcal/mol     68.818 cal/mol-K   -674.573524091623
cc-pVDZ:        325s    21.682 kcal/mol    68.819 cal/mol-K   -674.594059146606
aug-cc-pVDZ:  901s   21.605 kcal/mol    68.840 cal/mol-K   -674.623145113155

LANL2DZ(C)/6-+G* 262s  24.923 kcal/mol 68.981 cal/mol-K  -674.539040349134
UHF/aug-cc-pVDZ   373 s 26.196  kcal/mol 68.228 cal/mol-K -672.85402652170

Cation:
3-21G:               ---     21.164 kcal/mol     74.407 cal/mol-K    -670.763278724519 Hartree
6-31G:              142s   21.153 kcal/mol     74.645 cal/mol-K    -674.089132280731
6-31++G**:      637s   21.192 kcal/mol    73.768 cal/mol-K    -674.178146586266
cc-pVDZ:          399s   21.153 kcal/mol    73.736 cal/mol-K    -674.210312017948
aug-cc-pVDZ:   1776s 21.089 kcal/mol     73.774 cal/mol-K   -674.228204222891

LANL2DZ(C)/6-+G* 454s 24.795 kcal/mol  74.293 cal/mol-K -674.140922359750
UHF/aug-cc-pVDZ  741s 26.002 kcal/mol  72.462 cal/mol-K  -672.518095855130

2. Thermochemistry (ΔG of oxidation; gas phase)
3-21G:            -5.3620 kcal/mol +  261.22 kcal/mol =  6.814 V*
6-31G:            -2.4768 kcal/mol +  244.50 kcal/mol =  6.214 
6-31++G**:    -2.0178 kcal/mol+  248.10 kcal/mol =  6.390 
cc-pVDZ:        -1.9950 kcal/mol + 240.80 kcal/mol =  6.075 
aug-cc-pVDZ: -1.9871 kcal/mol + 247.83 kcal/mol =  6.380

LANL2DZ(C)/6-+G* -1.7118 kcal/mol + 249.82 kcal/mol 6.478
UHF/aug-cc-pVDZ -1.4564 kcal/mol +210.80 kcal/mol = 4.797

* vs SHE=4.281 eV

3. Solvation (cosmo/water/scfe)
neutral
3-21g:                66s    22.097 kcal/mol    68.875 cal/mol-K   -671.1936338426 Hartree
6-31g:                82s    22.277 kcal/mol    68.609 cal/mol-K   -674.4934780299
6-31++g**:       277s   21.493 kcal/mol    69.353 cal/mol-K  -674.586704959695
cc-pVDZ:          266s   21.869 kcal/mol    68.808 cal/mol-K  -674.605608009070
aug-cc-pVDZ:    712s  22.116 kcal/mol    69.596 cal/mol-K   -674.635237990779

LANL2DZ(C)/6-31+G* 180s  25.022 kcal/mol   69.073 cal/mol-K -674.552417717602
UHF/aug-cc-pVDZ 412s  24.083 kcal/mol 70.519 cal/mol-K  -672.868085966222

cation (solvation energy)**

3-21G:               --- /26s        21.164 kcal/mol     74.407 cal/mol-K     -670.881469242560 Hartree
6-31G:              142s/51s      21.153 kcal/mol     74.645 cal/mol-K     -674.175491218588
6-31++G**:      637s/111s   21.192 kcal/mol    73.768 cal/mol-K      -674.267298880087
cc-pVDZ:          399s/129s   21.153 kcal/mol    73.736 cal/mol-K      -674.294609415029
aug-cc-pVDZ:   1776s/311s 21.089 kcal/mol     73.774 cal/mol-K     -674.316552324118

LANL2DZ(C)/6-31+G* 454s 24.795 kcal/mol  74.293 cal/mol-K -674.232656980139
UHF/aug-cc-pVDZ   741s 26.002 kcal/mol  72.462 cal/mol-K -672.451040948823
** UHF can't be used with COSMO according to nwchem. Instead we use the cation thermo calcs in the gas phase and use the scfe from a cosmo calc.

Thermochemistry*** (using gas phase freq for both cation and neutral species with scfe w/ cosmo given in parentheses):

3-21G:            -2.5824+195.88=  4.101 V (3.981 V)
6-31G:            -2.9236+199.54=  4.245 V (4.265 V)
6-31++G**:   -1.6173+200.43=  4.341 V (4.324 V)
cc-pVDZ:       -2.1853+195.15=  4.087 V (4.095 V)
aug-cc-pVDZ: -2.2727+199.98= 4.293 V (4.305 V)

LANL2DZ(C)/6-31+G*  -0.41322+200.65= 4.402
UHF/aug-cc-pVDZ 1.3397+261.7 (!)= 7.126
* vs SHE=4.281 eV

*** using freq calc of neutral species with cosmo, vs freq calc of cation in gas phase and energy w/ cosmo

4. Spectra
We'll use octave for this. First, using cat and gawk, I put the x/y coordinates in a file.

gauss= @(x,f,i,sigma)  i.*1./(sigma.*sqrt(2*pi)).*exp(-0.5.*((x-f)./sigma).**2)
subplot(3,2,1); axis([0 4000 0 2])
spc=load('321g.spc');sf=spc(:,1); si=spc(:,2);x=linspace(0,4000,800);spec=cumsum(gauss(x,sf,si,75)); 
title("321g"); plot(x,spec(18,:))
subplot(3,2,2)
spc=load('ccpvdz.spc');sf=spc(:,1); si=spc(:,2);x=linspace(0,4000,800);spec=cumsum(gauss(x,sf,si,75));
title("ccPVDZ");plot(x,spec(18,:))
subplot(3,2,3)
spc=load('631g.spc');sf=spc(:,1); si=spc(:,2);x=linspace(0,4000,800);spec=cumsum(gauss(x,sf,si,75));
title("631g"); plot(x,spec(18,:))
subplot(3,2,4)
spc=load('augccpvdz.spc');sf=spc(:,1); si=spc(:,2);x=linspace(0,4000,800);spec=cumsum(gauss(x,sf,si,75));
title("aug-ccPVDZ");plot(x,spec(18,:))
subplot(3,2,5)
spc=load('631gppdd.spc');sf=spc(:,1); si=spc(:,2);x=linspace(0,4000,800);spec=cumsum(gauss(x,sf,si,75));
title("631++g**"); plot(x,spec(18,:))

From top to bottom: Left: 3-21G, 6-31G, 6-31++G**. Right: cc-pVDZ, aug-cc-pVDZ
5. Conclusions
It may seem weird that as a test case I picked a species I don't have any reference potential for. However, the goal here was to understand how the basis set affects the results, without being distracted by such things as Real Life.

The observed spectra can be divided into two group: 3-21G/6-31G vs 6-31++G**/cc-pVDZ/aug-cc-pVDZ. Polarization (and diffuse functions) seem to play a large role.

In terms of thermochemistry, not surprisingly aug-cc-pVDZ and 6-31++G** give very similar results since they both implement pol/diff functions. The computational cost is, however, significantly higher for aug-cc-pVDZ than 6-31++G**, at least in nwchem.

There is also little difference between doing freq calculations in gas phase vs using cosmo when it comes to the calculated redox potential for the more extensive basis sets.

3-21G gives very varying results, with it giving the highest potential in the gas phase but the second lowest potential with cosmo. cc-pVDZ consistently gives the lowest potential.

UHF/ROHF/HF are fast, but wildly inaccurate. LANL2DZ/6-31+G* looks ok, results-wise, but the thermodynamic corrections are actually much smaller in conjunction with COSMO than the other methods, which is suspicious.

If given the time I may post a more detailed analysis of polarisation vs diffuse functions later.