Author Topic: Newly installed Turbomole running slow  (Read 8249 times)

mariavd

  • Newbie
  • *
  • Posts: 8
  • Karma: +0/-0
Newly installed Turbomole running slow
« on: September 15, 2017, 03:08:27 PM »
Hello,

I installed Turbomole 7.1 on a compute cluster, and set up the environment variables according to the README. The TTEST completed successfully. A sample single-point calculation on benzene works fine, it even completed faster compared to the same control file and the same SLURM job script. However, when I ran dscf on a large molecule, the SCF iterations are very slow. On the other cluster dscf finished in 8 minutes but here it timed out 2 hours after submission, and only 4 iterations were completed. I requested 2 nodes, 32 cores in both cases, and the CPUs in both clusters are the same (Xeon E5-2620 v3). In the output, dscf reports that it is running on 32 processors in both cases.

What is the more likely option - that the SLURM system is not set up properly, or that I missed something when setting up Turbomole? I was given the binaries, so I do not know what compiler flags were used.

uwe

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 558
  • Karma: +0/-0
Re: Newly installed Turbomole running slow
« Reply #1 on: September 15, 2017, 04:10:02 PM »
Hi,

hm, was that on the SLURM cluster at the CSC in Espoo? Did you allocate all cores on the nodes?

To figure out what happens it could be useful to see the timings of the individual steps. If you add $profile to the control file and run the job, you will get detailed timings at the end of the output.

If things do not change even if all CPUs on the nodes are used, please contact the Turbomole support to get help.

Regards,

Uwe


mariavd

  • Newbie
  • *
  • Posts: 8
  • Karma: +0/-0
Re: Newly installed Turbomole running slow
« Reply #2 on: September 15, 2017, 11:10:21 PM »
Hello,

Thank you for the quick reply. Turbomole runs fine on the CSC cluster. I had to set it up on another cluster at the university. It has about a dozen free nodes at the moment, so I guess that if the SLURM system is set up correctly, it should send the jobs to CPUs from the same node.

Yes, I allocate all CPUs from the nodes. The Xeon E5-2620 v3 is an octacore CPU with hyper-threading, so the script requests 16 tasks per node:

#SBATCH --nodes 2         # for SMP only 1 is possible
#SBATCH --ntasks-per-node=16 # Tasks per node
#SBATCH --ntasks 32      # total number of cores (processes)

The $profile option for a simple ridft run on the cyclopentadienyl cation returned the following:

    dscf profiling
  --------------------------------------------------------------------
             module   cpu total (s)       %  wall total (s)       %

          dscf.total                   1.2  100.00            18.8  100.00
        dscf.prepare                 0.1    5.60             0.8    4.27
      prepare.oneint               0.0    0.72             0.1    0.68
     prepare.moinput             0.0    1.43             0.2    1.30
     prepare.orthmos             0.0    0.52             0.1    0.30
            dscf.scf                   1.1   93.88            17.8   94.68
             scf.pre                   0.0    0.08             0.1    0.42
        scf.makedmat              0.0    0.22             0.1    0.38
          scf.shlupf                  0.8   65.94            12.5   66.40
         dscf.shloop                 0.8   63.94            10.7   56.77
          scf.symcar                 0.0    0.11             0.0    0.00
        scf.makefock               0.0    0.17             0.3    1.45
          scf.energy                 0.0    0.01             0.0    0.00
          scf.pardft                  0.3   21.02             3.8   20.21
        dft_grid_con                0.0    0.82             0.0    0.04
          scf.newerg                0.0    0.00             0.0    0.00
          scf.newcnv                0.0    0.35             0.1    0.72
           scf.fdiag                   0.0    1.52             0.0    0.25
         diag_tritrn                  0.0    0.29             0.0    0.02
          diag_rdiag                 0.0    1.14             0.0    0.20
          scf.modump              0.0    2.70             0.5    2.77
            scf.post                   0.0    1.58             0.2    1.14
        dscf.postscf                 0.0    0.48             0.2    1.04
 


    ------------------------------------------------------------------------
         total  cpu-time :   1.25 seconds
         total wall-time :  20.15 seconds
    ------------------------------------------------------------------------

The difference between the wall clock time and CPU time is huge, which makes me think that there is some communication delay between separate nodes. I will first contact the IT support at the department, and if needed, the Turbomole support, too.

Cheers,
Maria