Author Topic: aoforce parallel version problem  (Read 4096 times)

gopa

  • Newbie
  • *
  • Posts: 4
  • Karma: +0/-0
aoforce parallel version problem
« on: February 08, 2021, 12:46:21 PM »
Hi,

    I am running an aoforce calculation for a molecule with 240 atoms
on a HPC Cluster (24 core with 128GB total RAM). I set the following
parameters in my shell script for running the parallel version of aoforce.

Code: [Select]
setenv LD_LIBRARY_PATH /TURBOMOLDE/libso/x86_64-unknown-linux-gnu_smp
setenv PARA_ARCH SMP
setenv PARNODES 24
setenv TURBOMOLE_SYSNAME x86_64-unknown-linux-gnu_smp
setenv TURBOARCH x86_64-unknown-linux-gnu_smp
setenv SMPCPUS 24
setenv TURBOMOLE_SYSNAME `sysname`

I see that aoforce module starts the parallel version with the following
lines printed in the beginning

Code: [Select]
SMPCPUS    set: Shared-memory Parallelization with  24 CPUs.
SMP Parallelization Reference:
C. van Wullen, J. Comput. Chem. 32 (2011) 1195--1201
operating system is UNIX !

I have set the maxcor value to 84000 MB (which will be roughly around
65% of the total RAM.

However, the aoforce is terminating with the following error
(please see the last part of aoforce output.


 
Code: [Select]
CONSTRUCTING S(i,j)xi
      ...terminated. cpu:     217.45       wall:     217.63
 
 
 
 
 CONSTRUCTING <i|x,y,z|j>*S(i,j)xi          -> Dip. deriv.
      ...terminated. cpu:       7.56       wall:       7.57
 
 
 
 
 CONSTRUCTING epsilon(i)*S(i,j)xi*S(i,j)chi -> Hessian
      ...terminated. cpu:      20.71       wall:      20.73
 
 
 
 
 CONSTRUCTING G(a,i)[S(k,l)xi]              -> RHS
              G(i,j)[S(k,l)xi]*S(i,j)chi    -> Hessian
 
         Maximum core memory set to                  84000 MB
         This corresponds to                   971 vectors in CAO basis

========================
 internal module stack:
------------------------
    force
    mkg_sxi
    lpdrc1
========================

 smp_fork: cannot fork
 force ended abnormally


I have explored the forum and was unlucky.
Could somebody help me telling what is going wrong here?


uwe

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 558
  • Karma: +0/-0
Re: aoforce parallel version problem
« Reply #1 on: February 11, 2021, 11:07:07 AM »
Hi,

which version of Turbomole are you using? In the Turbomole installation directory there should be a file named TURBOMOLE_<version-number>. The default parallelization changed from the shared-memory implementation of C. van Wullen (as given in the output) to the OpenMP version in newer Turbomole releases.

Regards,
Uwe

gopa

  • Newbie
  • *
  • Posts: 4
  • Karma: +0/-0
Re: aoforce parallel version problem
« Reply #2 on: February 11, 2021, 11:14:06 AM »
I figured out the problem. It was memory issue.
I was also setting $ricore parameter to a large value along with $maxcor, and it was blowing-up the memory.
I reduced the $ricore value and the calculation went well.
Thanks for lot for all the help and sorry for this trivial post.

gopa

  • Newbie
  • *
  • Posts: 4
  • Karma: +0/-0
Re: aoforce parallel version problem
« Reply #3 on: February 11, 2021, 11:15:54 AM »
Hi,

which version of Turbomole are you using? In the Turbomole installation directory there should be a file named TURBOMOLE_<version-number>. The default parallelization changed from the shared-memory implementation of C. van Wullen (as given in the output) to the OpenMP version in newer Turbomole releases.

Regards,
Uwe

I was using version 7.3.1
Can you confirm whether default parallelization is SMP or OpenMP for this version?

uwe

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 558
  • Karma: +0/-0
Re: aoforce parallel version problem
« Reply #4 on: February 11, 2021, 11:54:12 AM »
Hi,

Turbomole 7.3.1 already uses the OpenMP version as default. The output of aoforce should contain:

 
Quote
         OpenMP Shared-Memory Parallelization:  8 CPUs.

            By: Erik P. Almaraz and Filipp Furche
            Copyright 2009-2011 by UCI and TURBOMOLE GmbH.

Either you are using an older version (check the output, it should start with a line telling you which version you use), or you did set TM_PAR_FORK environment variable which switches back from OpenMP to the fork/shared-memory version.

About memory problems: Note that the settings in $ricore and $maxcor are per thread. So if you set $maxcor 1000 and run the job on 40 cores, it will use 40 times 1000 MB, so roughly 40GB. Setting it to 84 GB using 24 CPUs will need 2 TB memory.

Regards,
Uwe

gopa

  • Newbie
  • *
  • Posts: 4
  • Karma: +0/-0
Re: aoforce parallel version problem
« Reply #5 on: February 12, 2021, 03:17:49 AM »
Dear Uwe,
 
     You are right. I did set TM_PAR_FORK environment variable.
I have removed it and aoforce_omp is now running on the cluster with the following message
in aoforce output.

Quote
   OpenMP run-time library returned nthreads = 24
   operating system is UNIX !

Many thanks for your help.

Just to clarify, which version should we use for effective utilization: SMP or OpenMP ?




Hi,

Turbomole 7.3.1 already uses the OpenMP version as default. The output of aoforce should contain:

 
Quote
         OpenMP Shared-Memory Parallelization:  8 CPUs.

            By: Erik P. Almaraz and Filipp Furche
            Copyright 2009-2011 by UCI and TURBOMOLE GmbH.

Either you are using an older version (check the output, it should start with a line telling you which version you use), or you did set TM_PAR_FORK environment variable which switches back from OpenMP to the fork/shared-memory version.

About memory problems: Note that the settings in $ricore and $maxcor are per thread. So if you set $maxcor 1000 and run the job on 40 cores, it will use 40 times 1000 MB, so roughly 40GB. Setting it to 84 GB using 24 CPUs will need 2 TB memory.

Regards,
Uwe
« Last Edit: February 12, 2021, 03:19:55 AM by g.gopakumar »