Author Topic: Comparing Total CPU-time for different density functionals (Read 3690 times)

prasanta13 · « **on:** January 31, 2022, 01:50:41 PM »

Hi there,
I am trying to calculate and tabulate the time it takes to perform single-point energy calculation of a molecule by different density functionals. The DFT methods I use are b2-plyp, b3lyp (with/without -D2 and -D3), M06-2X, PBE (with/without -D2, -D3 (and ABC) and -D4) with def2-QZVPPD.
However, I am finding different results, as B3LYP taking significantly longer time (cpu-time = ~1 day 4 hours) to perform than B3LYP-D3 (cpu-time = ~ 22 hours). I have no idea.
Again, I was doing three consecutive job runs using same control, coord etc etc in different directory using same method and basis.
The runtime was (all cpu runtime), 1. 63490 mins, 2. 85263 mins, 3. 87830 mins.
This is very confusing. Why there is so much time difference?
I have not changed anything in bash (parallel methods), neither I have used different commands to call ridft. Everytime I used nohup ridft > ridft.out | tail -f ridft.out
There is no such difference in energy also.

What is the problem, can anyone help?
Thanks in advance and best regards.

Arnim · « **Reply #1 on:** March 02, 2022, 10:03:26 PM »

Hi,
did you check the number of SCF cycles?
Also, do you run the job on a local scratch disk? And are there mayber other jobs running on the machine?

Cheers,
Arnim

prasanta13 · « **Reply #2 on:** March 08, 2022, 08:48:39 AM »

Actually, same numbers of scf iterations were performed.
The computer was same with nothing but turbomole was running.
I didn't set up any scratch directory explicitly, neither local nor remote.

Thanks Arnim.

uwe · « **Reply #3 on:** March 10, 2022, 10:02:51 AM »

Hello,

did you use the parallel SMP version? Then the only valid output for timing is the wall time - which is the 'real' time the job needs. The CPU time is accumulated over all running threads and depends on a lot of factors, from my experience the numbers are very often meaningless...

prasanta13 · « **Reply #4 on:** March 10, 2022, 01:20:39 PM »

So I did the meaningless thing by taking the cpu-time for parallel run...

Thanks for the help Uwe...
Cheers

uwe · « **Reply #5 on:** March 10, 2022, 02:58:26 PM »

Hi,
can you share your wall time numbers here such that we can see if this is more consistent?
Cheers, Uwe

prasanta13 · « **Reply #6 on:** March 11, 2022, 06:10:22 AM »

Sure, I am sharing both wall time and CPU time for benzene dimer single-point calculation with B3LYP/def2-QZVPPD. I have run five such instances, all are given below.
1. total cpu-time : 20 hours 54 minutes and 32 seconds
total wall-time : 23 minutes and 42 seconds
2. total cpu-time : 1 days 7 hours 27 minutes and 4 seconds
total wall-time : 31 minutes and 35 seconds
3. total cpu-time : 22 hours 50 minutes and 16 seconds
total wall-time : 25 minutes and 9 seconds
4. total cpu-time : 21 hours 10 minutes and 33 seconds
total wall-time : 24 minutes and 16 seconds
5. total cpu-time : 21 hours 4 minutes and 11 seconds
total wall-time : 23 minutes and 52 seconds

The control file and ridft.out file for the first instances is also provided for your convenience.
control:

Code: [Select]

$title
$symmetry c1
$user-defined bonds    file=coord
$coord    file=coord
$optimize
 internal   off
 redundant  off
 cartesian  on
 global     off
 basis      off
$atoms
c  1,3,5,7,9,11,13,15,17,19,21,23                                              \
   basis =c def2-QZVPPD                                                        \
   jbas  =c universal
h  2,4,6,8,10,12,14,16,18,20,22,24                                             \
   basis =h def2-QZVPPD                                                        \
   jbas  =h universal
$basis    file=basis
$scfmo   file=mos
$closed shells
 a       1-42                                   ( 2 )
$scfiterlimit       30
$thize     0.10000000E-04
$thime        5
$scfdamp   start=0.300  step=0.050  min=0.100
$scfdump
$scfintunit
 unit=30       size=0        file=twoint
$scfdiis
$maxcor    500 MiB  per_core
$scforbitalshift  automatic=.1
$drvopt
   cartesian  on
   basis      off
   global     off
   hessian    on
   dipole     on
   nuclear polarizability
$interconversion  off
   qconv=1.d-7
   maxiter=25
$coordinateupdate
   dqmax=0.3
   interpolate  on
   statistics    5
$forceupdate
   ahlrichs numgeo=0  mingeo=3 maxgeo=4 modus=<g|dq> dynamic fail=0.3
   threig=0.005  reseig=0.005  thrbig=3.0  scale=1.00  damping=0.0
$forceinit on
   diag=default
$energy    file=energy
$grad    file=gradient
$forceapprox    file=forceapprox
$dft
   functional b3-lyp
   gridsize   m5
$scfconv   7
$ricore      500
$rij
$jbas    file=auxbasis
$rundimensions
   natoms=24
   nbf(CAO)=1404
   nbf(AO)=1152
$last step     ridft
$orbital_max_rnorm 0.37825343447967E-02
$last SCF energy change = -464.39460
$subenergy  Etot         E1                  Ej                Ex                 Ec                 En
-464.3945967316    -1873.134408000     835.7340698417    -52.77879696369    -3.277199147521     629.0617375378
$charge from ridft
          0.000 (not to be modified here)
$dipole from ridft
  x     0.00001818366549    y     0.00000086972861    z    -0.00020512736756    a.u.
   | dipole | =    0.0005234349  debye
$end

The ridft.out is,

Code: [Select]

   OpenMP run-time library returned nthreads = 64

 ridft (mozart) : TURBOMOLE rev. V7.5.0 compiled 17 Jun 2020 at 09:15:30
 Copyright (C) 2020 TURBOMOLE GmbH, Karlsruhe


    2022-01-31 16:03:58.238 



                                  r i d f t

                        DFT program with RI approximation 
                                for coulomb part 




                                                 
                                 References:     
                                                 
          TURBOMOLE:                             
              R. Ahlrichs, M. Baer, M. Haeser, H. Horn, and
              C. Koelmel
              Electronic structure calculations on workstation
              computers: the program system TURBOMOLE
              Chem. Phys. Lett. 162: 165 (1989)
          Density Functional:                              
              O. Treutler and R. Ahlrichs                      
              Efficient Molecular Numerical Integration Schemes
              J. Chem. Phys. 102: 346 (1995)                   
          Parallel Version:                                
              Performance of parallel TURBOMOLE for Density    
              Functional Calculations                          
              M. v. Arnim and R. Ahlrichs                      
              J. Comp. Chem. 19: 1746 (1998)                   
          RI-J Method:                                     
              Auxiliary Basis Sets to approximate Coulomb      
              Potentials                                       
              Chem. Phys. Lett. 240: 283 (1995)                
              K. Eichkorn, O. Treutler, H. Oehm, M. Haeser     
              and R. Ahlrichs                                  
              Chem. Phys. Lett. 242: 652 (1995)                
                                                           
              Auxiliary Basis Sets for Main Row Atoms and their
              Use to approximate Coulomb Potentials            
              K. Eichkorn, F. Weigend, O. Treutler and         
              R. Ahlrichs                                      
              Theo. Chem. Acc. 97: 119 (1997)                   
                                                           
              Accurate Coulomb-fitting basis sets for H to Rn 
              F. Weigend                                        
              Phys. Chem. Chem. Phys. 8: 1057 (2006)            
                                                           
          Multipole accelerated RI-J (MARI-J):             
              Fast evaluation of the Coulomb potential for     
              electron densities using multipole accelerated   
              resolution of identity approximation             
              M. Sierka, A. Hogekamp and R. Ahlrichs           
              J. Chem. Phys. 118: 9136 (2003)                  
          RI-JK Method:                                     
              A fully direct RI-HF algorithm: Implementation,
              optimised auxiliary basis sets, demonstration of
              accuracy and efficiency                         
              F. Weigend                                      
              Phys. Chem. Chem. Phys. 4: 4285 (2002)           
          Two-component HF and DFT with spin-orbit coupling:  
              Self-consistent treatment of spin-orbit         
              interactions with efficient Hartree-Fock and    
              density functional methods                      
              M. K. Armbruster, F. Weigend, C. van Wüllen and 
              W. Klopper                                      
              Phys. Chem. Chem. Phys. 10: 1748 (2008)         
          Two-component difference density and DIIS algorithm 
              Efficient two-component self-consistent field   
              procedures and gradients: implementation in     
              TURBOMOLE and application to Au20-              
              A. Baldes, F. Weigend                           
              Mol. Phys. 111: 2617 (2013)                     
          Relativistic all-electron 2c calculations           
              An efficient implementation of two-component    
              relativistic exact-decoupling methods for large 
              molecules                                       
              D. Peng, N. Middendorf, F. Weigend, M. Reiher   
              J. Chem. Phys. 138: 184105 (2013)               
          Finite nucleus model and SNSO approximation         
              Efficient implementation of one- and two-       
              component analytical energy gradients in exact  
              two-component theory                            
              Y. J. Franzke, N. Middendorf, F. Weigend        
              J. Chem. Phys. 148: 104110 (2018)               
          Grids for all-electron relativistic methods         
              Error-consistent segmented contracted all-      
              electron relativistic basis sets of double-     
              and triple-zeta quality for NMR shielding       
              constants                                       
              Y. J. Franzke, R. Tress, T. M. Pazdera,         
              F. Weigend                                      
              Phys. Chem. Chem. Phys. 21: 166658 (2019)       
          Seminumerical exchange algorithms                   
              Seminumerical calculation of the Hartree-Fock   
              exchange matirx: Application to two-component   
              procedures and efficient evaluation of local    
              hybrid functionsl                               
              P. Plessow, F. Weigend,                         
              J. Comput. Chem. 33: 810 (2012)                 
          Improved seminumerical algorithms                   
              C. Holzer, in preparation (2020)                
                                         





          OpenMP Shared-Memory Parallelization: 64 CPUs.

            By: Christof Holzer and Yannick J. Franzke


              +--------------------------------------------------+
              |      general information about current run       |
              +--------------------------------------------------+

 
 Becke-3-Parameter hybrid functional: B3-LYP
 exchange:    0.8*LDA + 0.72*B88 + 0.2*HF
 correlation: 0.19*LDA(VWN) + 0.81*LYP
 A Hybrid-DFT calculation using the RI-J approximation will be carried out.
 Allocatable memory for RI due to $ricore (MB):                   500


              +--------------------------------------------------+
              | Atomic coordinate, charge and isotop information |
              +--------------------------------------------------+

                    atomic coordinates            atom    charge  isotop
          1.34670449    2.11837486    0.11440550    c      6.000     0
          2.56594884    3.75375040    0.24138827    h      1.000     0
          2.37772093   -0.30094248    0.23476732    c      6.000     0
          4.39352684   -0.54254014    0.46627676    h      1.000     0
          0.80669567   -2.40850633    0.08059763    c      6.000     0
          1.60710747   -4.28671650    0.17905146    h      1.000     0
         -1.79444206   -2.09773595   -0.18956522    c      6.000     0
         -3.01308458   -3.73461605   -0.30937364    h      1.000     0
         -2.82613387    0.32323871   -0.30527773    c      6.000     0
         -4.84484732    0.56544602   -0.51722393    h      1.000     0
         -1.25445236    2.43140268   -0.15760587    c      6.000     0
         -2.05394464    4.31046568   -0.25111840    h      1.000     0
          3.68399881    2.09781576    6.80458924    c      6.000     0
          4.90257948    3.73466581    6.92506063    h      1.000     0
          4.71570069   -0.32320894    6.92084468    c      6.000     0
          6.73431915   -0.56537177    7.13390038    h      1.000     0
          3.14411094   -2.43138759    6.77221691    c      6.000     0
          3.94361950   -4.31044137    6.86604874    h      1.000     0
          0.54308895   -2.11838275    6.49874189    c      6.000     0
         -0.67597827   -3.75377110    6.37093474    h      1.000     0
         -0.48786979    0.30090652    6.37786400    c      6.000     0
         -2.50350465    0.54246872    6.14508988    h      1.000     0
          1.08301598    2.40848171    6.53295560    c      6.000     0
          0.28266995    4.28669002    6.43399267    h      1.000     0
 
       center of nuclear mass  :    0.94484663    0.00000451    3.30704123
       center of nuclear charge:    0.94484812    0.00000437    3.30703847

              +--------------------------------------------------+
              |               basis set information              |
              +--------------------------------------------------+

              we will work with the 1s 3p 5d 7f 9g ... basis set
              ...i.e. with spherical basis functions...

   type   atoms  prim   cont   basis
   ---------------------------------------------------------------------------
    c       12     83     63   def2-QZVPPD   [8s4p4d2f1g|16s8p4d2f1g]
    h       12     36     33   def2-QZVPPD   [4s4p2d1f|7s4p2d1f]
   ---------------------------------------------------------------------------
   total:   24   1428   1152
   ---------------------------------------------------------------------------

   total number of primitive shells          :   45
   total number of contracted shells         :  360
   total number of cartesian basis functions : 1404
   total number of SCF-basis functions       : 1152


 integral neglect threshold       :  0.24E-11
 integral storage threshold THIZE :  0.10E-04
 integral storage threshold THIME :         5

 RI-J AUXILIARY BASIS SET information:

              we will work with the 1s 3p 5d 7f 9g ... basis set
              ...i.e. with spherical basis functions...

   type   atoms  prim   cont   basis
   ---------------------------------------------------------------------------
    c       12     70     49   universal   [6s4p3d1f1g|12s5p4d2f1g]
    h       12     16     11   universal   [3s1p1d|5s2p1d]
   ---------------------------------------------------------------------------
   total:   24   1032    720
   ---------------------------------------------------------------------------

   total number of primitive shells          :   32
   total number of contracted shells         :  240
   total number of cartesian basis functions :  876
   total number of SCF-basis functions       :  720


 symmetry group of the molecule :   c1 

 the group has the following generators :
   c1(z)

    1 symmetry operations found

 there are 1 real representations :   a   

 maximum number of shells which are related by symmetry :  1

  
           ------------------
           density functional
           ------------------
 Becke-3-Parameter hybrid functional: B3-LYP
 exchange:    0.8*LDA + 0.72*B88 + 0.2*HF
 correlation: 0.19*LDA(VWN) + 0.81*LYP

 iterations will be done with small grid
  
 spherical integration : Lebedev's spherical grid
 spherical gridsize    :                     5
    i.e. gridpoints    :                   590
 value for diffuse not defined
 radial integration    : Chebyshev 2nd kind (scaling 3)
 radial gridsize       :                     8
 integration cells     :                    24
 partition function    : becke
 partition sharpness   :                     3
  

 biggest AO integral is expected to be     5.262544080

          ------------------------
          nuclear repulsion energy  :   629.061737538    
          ------------------------


         -----------------
         -S,T+V- integrals
         -----------------

 1e-integrals will be neglected if expon. factor < 0.238031E-12
 
   Difference densities algorithm switched on.
   The maximal number of linear combinations of
   difference densities is                    20 .

 DIIS switched on: error vector is FDS-SDF
 Max. Iterations for DIIS is     :   4
 DIIS matrix (see manual) 
    Scaling factor of diagonals  :  1.200
    threshold for scaling factor :  0.000

 scf convergence criterion : increment of total energy < .1000000D-06
                  and increment of one-electron energy < .1000000D-03

 MOs are in ASCII format !


    mo occupation :
   irrep   mo's   occupied
    a     1152       42
 
 number of basis functions   :  1152
 number of occupied orbitals :    42
 

 reading orbital data $scfmo  from file mos
 orbital characterization : expanded
 virtual MOs provided and orthogonalized by Cholesky decomposition

 automatic virtual orbital shift switched on 
      shift if e(lumo)-e(homo) < 0.10000000    

  
           ------------------------
               RI-J - INFORMATION
           ------------------------
 Contributions to RI integral batches: 
  neglected integral batches:                 13039
  direct contribution:                        38593
  memory contribution:                 13348
 Memory core needed for (P|Q) and Cholesky      4 MByte
 Memory core minimum needed except of (P|Q)     1 MByte
 Total minimum memory core needed (sum)         5 MByte
  
 ****************************************
 Memory allocated for RI-J   368 MByte
 ****************************************
                                            

 DSCF restart information will be dumped onto file mos


 Starting SCF iterations

          Overall gridpoints after grid construction =        114189

 ITERATION  ENERGY          1e-ENERGY        2e-ENERGY     NORM[dD(SAO)]  TOL
   1  -462.76998066830    -1852.4017515     760.57003328    0.000D+00 0.237D-11
                            Exc = -54.3418580931     Coul =  827.790865307    
                            exK = -12.8789739376    
                              N = 83.999861595    
                            current damping = 0.300
 
          max. resid. norm for Fia-block=  4.972D-01 for orbital     14a         
          max. resid. fock norm         =  4.248D+01 for orbital    934a         

 ITERATION  ENERGY          1e-ENERGY        2e-ENERGY     NORM[dD(SAO)]  TOL
   2  -464.25952120760    -1874.8112773     781.49001858    0.145D+03 0.237D-11
                            Exc = -56.0234961703      Eck =  837.513514755    
                              N = 83.999927448    
                            current damping = 0.250
 
          Norm of current diis error:  2.9008    
          max. resid. norm for Fia-block=  7.004D-02 for orbital     13a         
          max. resid. fock norm         =  1.533D-01 for orbital     70a         

 ITERATION  ENERGY          1e-ENERGY        2e-ENERGY     NORM[dD(SAO)]  TOL
   3  -464.36341842212    -1869.5371526     776.11199662    0.606D+02 0.175D-11
                            Exc = -55.8661987655      Eck =  831.978195388    
                              N = 83.999936630    
                            current damping = 0.200
 
          Norm of current diis error:  1.4969    
          max. resid. norm for Fia-block=  2.778D-02 for orbital     23a         
          max. resid. fock norm         =  3.609D-02 for orbital     23a         

 ITERATION  ENERGY          1e-ENERGY        2e-ENERGY     NORM[dD(SAO)]  TOL
   4  -464.39301936418    -1873.5705724     780.11581549    0.977D+01 0.167D-11
                            Exc = -56.0566764334      Eck =  836.172491922    
                              N = 83.999944236    
                            current damping = 0.250
 
          Norm of current diis error: 0.30891    
          max. resid. norm for Fia-block=  7.064D-03 for orbital     40a         
          max. resid. fock norm         =  1.235D-02 for orbital   1147a         

 ITERATION  ENERGY          1e-ENERGY        2e-ENERGY     NORM[dD(SAO)]  TOL
   5  -464.39411208778    -1873.1112574     779.65540773    0.215D+01 0.123D-11
                            Exc = -56.0578033446      Eck =  835.713211075    
                              N = 83.999948759    
                            current damping = 0.300
 
          Norm of current diis error: 0.16916    
          max. resid. norm for Fia-block=  3.307D-03 for orbital     41a         
          max. resid. fock norm         =  8.877D-03 for orbital   1147a         
          mo-orthogonalization: Cholesky decomposition

 ITERATION  ENERGY          1e-ENERGY        2e-ENERGY     NORM[dD(SAO)]  TOL
   6  -464.39455645028    -1873.1122018     779.65590777    0.583D+00 0.109D-11
                            Exc = -56.0556637591      Eck =  835.711571525    
                              N = 83.999948631    
                            current damping = 0.350
 
          Norm of current diis error: 0.47262E-01
          max. resid. norm for Fia-block=  8.593D-04 for orbital     40a         
          max. resid. fock norm         =  3.590D-03 for orbital    216a         

 ITERATION  ENERGY          1e-ENERGY        2e-ENERGY     NORM[dD(SAO)]  TOL
   7  -464.39459159106    -1873.1172024     779.66087328    0.329D+00 0.105D-11
                            Exc = -56.0549241657      Eck =  835.715797441    
                              N = 83.999948861    
                            current damping = 0.200
 
          Norm of current diis error: 0.10878E-01
          max. resid. norm for Fia-block=  2.587D-04 for orbital     27a         
          max. resid. fock norm         =  2.269D-03 for orbital    216a         

 ITERATION  ENERGY          1e-ENERGY        2e-ENERGY     NORM[dD(SAO)]  TOL
   8  -464.39459280386    -1873.1485946     779.69226424    0.195D+00 0.993D-12
                            Exc = -56.0567618643      Eck =  835.749026106    
                              N = 83.999949027    
                            current damping = 0.100
 
          Norm of current diis error: 0.69219E-02
          max. resid. norm for Fia-block=  1.173D-04 for orbital     39a         
          max. resid. fock norm         =  8.811D-04 for orbital    292a         

 ITERATION  ENERGY          1e-ENERGY        2e-ENERGY     NORM[dD(SAO)]  TOL
   9  -464.39459349489    -1873.1320139     779.67568287    0.208D+00 0.958D-12
                            Exc = -56.0559238699      Eck =  835.731606740    
                              N = 83.999949035    
                            current damping = 0.150
 
          Norm of current diis error: 0.14810E-02
          max. resid. norm for Fia-block=  2.941D-05 for orbital     41a         
          max. resid. fock norm         =  1.438D-03 for orbital    292a         

 ITERATION  ENERGY          1e-ENERGY        2e-ENERGY     NORM[dD(SAO)]  TOL
  10  -464.39459351718    -1873.1352195     779.67888846    0.201D+00 0.936D-12
                            Exc = -56.0560075273      Eck =  835.734895988    
                              N = 83.999949033    
                            current damping = 0.200
 
          Norm of current diis error: 0.85515E-03
          max. resid. norm for Fia-block=  1.339D-05 for orbital     41a         
          max. resid. fock norm         =  1.276D-03 for orbital    216a         
          mo-orthogonalization: Cholesky decomposition

 ITERATION  ENERGY          1e-ENERGY        2e-ENERGY     NORM[dD(SAO)]  TOL
  11  -464.39459352402    -1873.1345997     779.67826862    0.207D+00 0.887D-12
                            Exc = -56.0560016912      Eck =  835.734270315    
                              N = 83.999949040    
                            current damping = 0.250
 
          Norm of current diis error: 0.26779E-03
          max. resid. norm for Fia-block=  5.748D-06 for orbital     41a         
          max. resid. fock norm         =  9.798D-04 for orbital    216a         

 ENERGY CONVERGED !

          Overall gridpoints after grid construction =        368523

 ITERATION  ENERGY          1e-ENERGY        2e-ENERGY     NORM[dD(SAO)]  TOL
  12  -464.39459673160    -1873.1344080     779.67807373    0.109D+00 0.834D-12
                            Exc = -56.0559961112      Eck =  835.734069842    
                              N = 83.999995742    
                            current damping = 0.100
 
          Norm of current diis error: 0.10678E-03
          max. resid. norm for Fia-block=  7.888D-06 for orbital     33a         
          max. resid. fock norm         =  3.783D-03 for orbital    216a         

 End of SCF iterations

   convergence criteria satisfied after    12 iterations


                  ------------------------------------------ 
                 |  total energy      =   -464.39459673160  |
                  ------------------------------------------ 
                 :  kinetic energy    =    462.23698088144  :
                 :  potential energy  =   -926.63157761304  :
                 :  virial theorem    =      1.99535391698  :
                 :  wavefunction norm =      1.00000000000  :
                  .......................................... 


 <geterg> : there is no data group $energy 


 <skperg> : $end is missing 


 orbitals $scfmo  will be written to file mos

    irrep                 38a         39a         40a         41a         42a   
 eigenvalues H         -0.34204    -0.26332    -0.25315    -0.25003    -0.24163
            eV          -9.3074     -7.1653     -6.8886     -6.8037     -6.5751
 occupation              2.0000      2.0000      2.0000      2.0000      2.0000

    irrep                 43a         44a         45a         46a         47a   
 eigenvalues H         -0.01554    -0.01074    -0.01016    -0.00550    -0.00027
            eV          -0.4228     -0.2923     -0.2765     -0.1497     -0.0072
 
 
 
 
 ==============================================================================
                           electrostatic moments
 ==============================================================================

 reference point for electrostatic moments:    0.00000   0.00000   0.00000

 
              nuc           elec       ->  total
 ------------------------------------------------------------------------------
                          charge      
 ------------------------------------------------------------------------------
          84.000000     -84.000000       0.000000
 
 ------------------------------------------------------------------------------
                       dipole moment  
 ------------------------------------------------------------------------------
   x      79.367242     -79.367224       0.000018
   y       0.000367      -0.000366       0.000001
   z     277.791231    -277.791436      -0.000205
 
   | dipole moment | =     0.0002 a.u. =     0.0005 debye 
 
 ------------------------------------------------------------------------------
                     quadrupole moment
 ------------------------------------------------------------------------------
  xx     566.780538    -615.718905     -48.938367
  yy     380.765157    -429.315839     -48.550682
  zz    1861.767389   -1921.985465     -60.218076
  xy      -0.891919       0.901208       0.009289
  xz     630.117393    -629.442455       0.674939
  yz      -4.987954       4.939007      -0.048947
 
     1/3  trace=     -52.569041
     anisotropy=      11.538162
 
 ==============================================================================
 
HOMO-LUMO Separation
 HOMO         :   -0.24162879 H =     -6.57506 eV
 LUMO         :   -0.01553732 H =     -0.42279 eV
 HOMO-LUMO gap:    0.22609147 H =     +6.15227 eV
 
 ==============================================================================


    ------------------------------------------------------------------------
         total  cpu-time : 20 hours 54 minutes and 32 seconds
         total wall-time : 23 minutes and 42 seconds
    ------------------------------------------------------------------------

   ****  ridft : all done  ****


    2022-01-31 16:27:40.589 

 ridft ended normally

Hope this helps to all.

uwe · « **Reply #7 on:** March 13, 2022, 12:08:03 PM »

Hi,

the difference in wall time looks more reasonable, but still too large to be explained by noise only. As you wrote that this was the very same job on the very same machine, something must have changed during the different runs.

First thing that comes to my mind is CPU temperature. If the PC is idle and cools down, it will run faster in the first couple of minutes, but then throttles CPU frequency when it heats up. Modern CPUs do change the clock speed quite often which makes it hard to run benchmarks.

I am more concerned about your total wall time. I tried the same job on 48 cores and it was done in 10 minutes using Turbomole 7.5 (the version you also used). I wonder how many cores you have on your machine.

Please try to run lscpu and check the output:

$ lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 96 On-line CPU(s) list: 0-95 Thread(s) per core: 2 Core(s) per socket: 24 Socket(s): 2 NUMA node(s): 2

To get the number of cores, multiply Socket(s) with Core(s) per socket:, in the example here this gives 48. The number of CPU(s): is given as 96, but just because Hyper-threading is activated, so each physical core is treated as two virtual ones (Thread(s) per core: 2).

Next question is: What do you want to achieve with the timings? Compare different CPU types? Or find the fastest way to do those kind of calculations?

If you are looking for an optimal system for speed, please note that a) newer versions of Turbomole might be more efficient and b) using different methods like semi-numerical treatment of Hartree-Fock exchange for hybrid functionals like B3-LYP can have a large impact (especially if the basis set is large or even 'huge' as def2-QZVPPD). See e.g. https://arxiv.org/abs/1610.07779

I took your input and ran four jobs, a default RI-DFT calculation and one with semi-numerical exchange activated ($senex keyword), using either Turbomole version 7.5 or the latest (March 2022) 7.6:

Exchange	Version	Energy	Time
default	7.5	-464.3945967	10 min 1 sec
default	7.6	-464,3945967	7 min 28 sec
senex	7.5	-464,3945679	1 min 15 sec
senex	7.6	-464,3945711	50 sec

For 'production runs' the total energy is not really important and the error for relative energies is (much) smaller. As you can see, quite some work has been done on the seminumerical exchange algorithm (see https://aip.scitation.org/doi/10.1063/5.0022755 and https://www.turbomole.org/turbomole/release-notes-turbomole-7-6/.

prasanta13 · « **Reply #8 on:** March 15, 2022, 09:43:35 AM »

The reason I needed those numbers was to compare the time required for PBE (with/without dispersion correction), B3LYP and other XC functionals. This is the output for lscpu from which machine I ran these jobs.

Code: [Select]

Architecture:                    x86_64
CPU op-mode(s):                  32-bit, 64-bit
Byte Order:                      Little Endian
Address sizes:                   46 bits physical, 48 bits virtual
CPU(s):                          64
On-line CPU(s) list:             0-63
Thread(s) per core:              2
Core(s) per socket:              16
Socket(s):                       2
NUMA node(s):                    2
Vendor ID:                       GenuineIntel
CPU family:                      6
Model:                           85
Model name:                      Intel(R) Xeon(R) Gold 5218 CPU @ 2.30GHz
Stepping:                        7
CPU MHz:                         2300.000
CPU max MHz:                     3900.0000
CPU min MHz:                     1000.0000
BogoMIPS:                        4600.00
L1d cache:                       1 MiB
L1i cache:                       1 MiB
L2 cache:                        32 MiB
L3 cache:                        44 MiB
NUMA node0 CPU(s):               0-15,32-47
NUMA node1 CPU(s):               16-31,48-63
Vulnerability Itlb multihit:     KVM: Mitigation: VMX unsupported
Vulnerability L1tf:              Not affected
Vulnerability Mds:               Not affected
Vulnerability Meltdown:          Not affected
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp
Vulnerability Spectre v1:        Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2:        Mitigation; Enhanced IBRS, IBPB conditional, RSB filling
Vulnerability Srbds:             Not affected
Vulnerability Tsx async abort:   Mitigation; TSX disabled
Flags:                           fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb r
                                 dtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl smx
                                  est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3
                                 dnowprefetch cpuid_fault epb cat_l3 cdp_l3 invpcid_single intel_ppin ssbd mba ibrs ibpb stibp ibrs_enhanced fsgsbase tsc_adjust bmi1 avx2 sm
                                 ep bmi2 erms invpcid cqm mpx rdt_a avx512f avx512dq rdseed adx smap clflushopt clwb intel_pt avx512cd avx512bw avx512vl xsaveopt xsavec xget
                                 bv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts hwp hwp_act_window hwp_epp hwp_pkg_req pku ospke avx512
                                 _vnni md_clear flush_l1d arch_capabilities

TURBOMOLE Users Forum

Author Topic: Comparing Total CPU-time for different density functionals (Read 3690 times)

prasanta13

Comparing Total CPU-time for different density functionals

Arnim

Re: Comparing Total CPU-time for different density functionals

prasanta13

Re: Comparing Total CPU-time for different density functionals

uwe

Re: Comparing Total CPU-time for different density functionals

prasanta13

Re: Comparing Total CPU-time for different density functionals

uwe

Re: Comparing Total CPU-time for different density functionals

prasanta13

Re: Comparing Total CPU-time for different density functionals

uwe

Re: Comparing Total CPU-time for different density functionals

prasanta13

Re: Comparing Total CPU-time for different density functionals