Author Topic: ricc2 dipole moment calculation problem for MPI+amd nodes  (Read 2350 times)

glebreto

  • Newbie
  • *
  • Posts: 2
  • Karma: +0/-0
ricc2 dipole moment calculation problem for MPI+amd nodes
« on: February 15, 2024, 06:05:17 PM »
Dear all,

I am new at Turbomol and I am trying to perform ADC(2) or CC2 excited state calculation. I run on 2 architectures: x86_64 and em64t. The excitation energies are computed using the same input with em64t and MPI or SMP as the oscillator strength. Using x86_64 , the excitation energies are computed with the MPI or SMP, but only the SMP works when I ask for the oscillator strength. The MPI lead to memory issues, which is strange since it is only a small molecule with quite a large RAM available (128G).

Here is the inputs:

coord:
$coord  natoms=     2
    0.00000000000000      0.00000000000000     -0.02489783      cl
    0.00000000000000      0.00000000000000      2.38483140      h
$user-defined bonds
$end

control:
$coord    file=coord
$atoms
  basis =cc-pVDZ
  cbasis =cc-pVDZ
$symmetry c1
$denconv 1.d-8
$eht charge=0 unpaired=0
$ricc2
   adc(2)
   maxiter = 100
   mxdiis = 50
   conv=8
   iprint=5
$excitations
   irrep=a  multiplicity=1  nexc=4
   spectrum states=all operators=diplen
   maxiter = 100
   mxdiis = 50
   conv=8
$freeze
   defcore
$maxcor 70000 mib per_node
$end


and the submission file:
#!/bin/ksh
#$ -N turbomol
#$ -q batch
#$ -pe dmp* 32
#$ -l vendor=amd

module purge
export TURBODIR=/work/shared/icmub/TurboMole/TmoleX2024/TURBOMOLE                                                                                                         
export PATH=$TURBODIR/scripts:$PATH
export PARA_ARCH=MPI
export PATH=$TURBODIR/bin/`sysname`:$PATH

tmpdir='/tmp3/'${JOB_ID}'/TMP'
mkdir -p $tmpdir
export TURBOTMPDIR=$tmpdir

export PARNODES=2
export OMP_NUM_THREADS=2

echo $NSLOTS

ulimit -a

date
dscf &> dscf.out
ricc2 &> ricc2.out
date

and the beginning and end of the ricc2.out :

tmpdir in control file set to "/tmp3/2402/TMP".
This directory must exist and be writable by the master process (slave1).
STARTING ricc2 VIA YOUR QUEUING SYSTEM!
RUNNING PROGRAM /work/shared/icmub/TurboMole/TmoleX2024/TURBOMOLE/bin/x86_64-unknown-linux-gnu_mpi/ricc2_mpi.
/work/shared/icmub/TurboMole/TmoleX2024/TURBOMOLE/mpirun_scripts/IMPI/intel64/bin/mpirun -machinefile NodeFile.50523 -genv OMP_NUM_THREADS=2 -genv TURBODIR=/work/shared/icmub/TurboMole/TmoleX2024/TURBOMOLE -genv I_MPI_PIN=off -genv OMP_STACK_SIZE=256M -genv LD_LIBRARY_PATH=/beegfs/data/work/shared/icmub/TurboMole/TmoleX2024/TURBOMOLE/mpirun_scripts/IMPI/intel64//libfabric/lib:/beegfs/data/work/shared/icmub/TurboMole/TmoleX2024/TURBOMOLE/mpirun_scripts/IMPI/intel64//lib/release:/beegfs/data/work/shared/icmub/TurboMole/TmoleX2024/TURBOMOLE/mpirun_scripts/IMPI/intel64//lib:/work/shared/icmub/TurboMole/TmoleX2024/TURBOMOLE/libso/x86_64-unknown-linux-gnu_mpi /work/shared/icmub/TurboMole/TmoleX2024/TURBOMOLE/bin/x86_64-unknown-linux-gnu_mpi/ricc2_mpi
 this is node-proc. number 1 running on node part064.u-bourgogne.fr
 the total number of node-proc. spawned is  33
  parallel platform: MPP or cluster with fast interconnect

   OpenMP run-time library returned nthreads =  2

     Program not compiled with OMP parallelization
     ... only 1 thread can used...    0    2

 ricc2 (part064.u-bourgogne.fr) : TURBOMOLE rev. V7-8 compiled 22 Nov 2023 at 12:25:37
 Copyright (C) 2023 TURBOMOLE GmbH, Karlsruhe


    2024-02-15 17:48:55.656



                              R I C C 2 - PROGRAM

                          the quantum chemistry groups
                             at the universities in
                               Karlsruhe & Bochum
                                   Germany
-------------------------------
caled vector with:  0.983513025232158
renormalized left eigenvector  2
overlap (left|right):  1.0338E+00
scaled vector with:  9.8351E-01
 norm of right eigenvector:   1.00056976448406        1.01702333053144
 scaled vector with:  0.983261612570339
renormalized left eigenvector  3
overlap (left|right):  1.0343E+00
scaled vector with:  9.8326E-01
 norm of right eigenvector:   1.00065470833162        1.01825960571866
 scaled vector with:  0.982067828659689
renormalized left eigenvector  4
overlap (left|right):  1.0369E+00
scaled vector with:  9.8207E-01

      The semi-canonical algorithm will be used for densities


                    ========   CC DENSITY MODULE   ========

                      current wave-function model: ADC(2)

  calculating     4 xi densities

   a semicanonical algorithm will be used when possible

    density nr.      cpu/min        wall/min    L     R
   ------------------------------------------------------
 total memory allocated in ccn5den1:      1 Mbyte
 number of batches in I-loop:   2
 memory allocated per RI-intermediate in I-loop:   1 MByte
 memory allocated per RI-intermediate in j-loop:   1 MByte
 total memory allocated in cc_ybcont:   1 Mbyte
     time in cc_ybcont     cpu:  0.00 sec    wall:  0.00 sec    ratio:  1.0

-----
total memory allocated in ccn5den1:      1 Mbyte
 number of batches in I-loop:   2
 memory allocated per RI-intermediate in I-loop:   1 MByte
 memory allocated per RI-intermediate in j-loop:   1 MByte
 total memory allocated in cc_ybcont:   1 Mbyte
     time in cc_ybcont     cpu:  0.00 sec    wall:  0.00 sec    ratio:  1.0
 number of batches in I-loop:   2
 memory allocated per RI-intermediate in I-loop:   1 MByte
 memory allocated per RI-intermediate in j-loop:   1 MByte
 total memory allocated in cc_ybcont:   1 Mbyte
     time in cc_ybcont     cpu:  0.00 sec    wall:  0.00 sec    ratio:  1.0
 number of batches in I-loop:   2
 memory allocated per RI-intermediate in I-loop:   1 MByte
 memory allocated per RI-intermediate in j-loop:   1 MByte
 total memory allocated in cc_ybcont:   1 Mbyte
     time in cc_ybcont     cpu:  0.00 sec    wall:  0.00 sec    ratio:  1.0
         2             0.00            0.00    LE0    R0
 total memory allocated in ccn5den1:      1 Mbyte
Abort(403292676) on node 9 (rank 8 in comm 496): Fatal error in PMPI_Recv: Invalid tag, error stack:
PMPI_Recv(173): MPI_Recv(buf=0x2b6faeb1f7c0, count=1260, dtype=0x4c000829, src=1, tag=1048577, comm=0x84000002, status=0x7ffe3587d280) failed
PMPI_Recv(105): Invalid tag, value is 1048577
Abort(269074948) on node 10 (rank 9 in comm 496): Fatal error in PMPI_Recv: Invalid tag, error stack:
PMPI_Recv(173): MPI_Recv(buf=0x2b58b7625b40, count=1260, dtype=0x4c000829, src=1, tag=1048577, comm=0x84000002, status=0x7ffccfe71c80) failed
PMPI_Recv(105): Invalid tag, value is 1048577
-----
PMPI_Recv(105): Invalid tag, value is 1048577
Abort(805945860) on node 13 (rank 12 in comm 496): Fatal error in PMPI_Recv: Invalid tag, error stack:
PMPI_Recv(173): MPI_Recv(buf=0x2b4fef781840, count=1260, dtype=0x4c000829, src=1, tag=1048577, comm=0x84000002, status=0x7ffe3cb10f00) failed
PMPI_Recv(105): Invalid tag, value is 1048577
Abort(671728132) on node 18 (rank 17 in comm 496): Fatal error in PMPI_Recv: Invalid tag, error stack:
PMPI_Recv(173): MPI_Recv(buf=0x2b39ec0dc1c0, count=1260, dtype=0x4c000829, src=1, tag=1048577, comm=0x84000002, status=0x7ffd404f0200) failed
PMPI_Recv(105): Invalid tag, value is 1048577
Abort(671728132) on node 29 (rank 28 in comm 496): Fatal error in PMPI_Recv: Invalid tag, error stack:
PMPI_Recv(173): MPI_Recv(buf=0x2b634cd4b7c0, count=1260, dtype=0x4c000829, src=1, tag=1048577, comm=0x84000002, status=0x7fffa1659c00) failed
PMPI_Recv(105): Invalid tag, value is 1048577

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   RANK 0 PID 50712 RUNNING AT part064.u-bourgogne.fr
=   KILLED BY SIGNAL: 9 (Killed)
===================================================================================

Here is the output of the submission script:
Prologue begin
Starter begin : part064.u-bourgogne.fr(49194)
jeu. févr. 15 17:48:43 CET 2024
Version CentOS : 7.7
Starter(49194): PATH=/usr/ccub/sge/scripts:/tmp3/2402.1.batch:/work/shared/icmub/TurboMole/TmoleX2024/TURBOMOLE/bin/em64t-unknown-linux-gnu:/work/shared/icmub/TurboMole/TmoleX2024/TURBOMOLE/scripts:/soft/c7/gv/6.1.1/gv:/soft/c7/spack/0.18.0/packages/linux-centos7-haswell/gcc/11.2.0/gcc/4.8.5/g75x5bhqcqxorvp32f6vs2h3e4vb7tpm/bin:/usr/lib64/qt-3.3/bin:/soft/c7/modules/4.1.2/bin:/usr/ccub/sge-8.1.8/bin:/usr/ccub/sge-8.1.8/bin/lx-amd64:/user1/icmub/gu9875le/bin:/bin:/usr/bin:/usr/sbin:/etc:/usr/ccub/bin:/usr/local/bin:/user1/icmub/gu9875le/bin:.:/work/shared/icmub/bin:/soft/c7/gaussian/16avx2/g16/bsd:/soft/c7/gaussian/16avx2/g16/local:/soft/c7/gaussian/16avx2/g16/extras:/soft/c7/gaussian/16avx2/g16
Starter exec(49194) : '/usr/ccub/sge-8.1.8/ccub/spool/part064/job_scripts/2402'
32
time(cpu-seconds)    unlimited
file(blocks)         unlimited
coredump(blocks)     unlimited
data(KiB)            unlimited
stack(KiB)           unlimited
lockedmem(KiB)       unlimited
nofiles(descriptors) 1024
processes            unlimited
flocks               unlimited
sigpending           513331
msgqueue(bytes)      819200
maxnice              0
maxrtprio            0
address-space(KiB)   unlimited
jeu. févr. 15 17:48:44 CET 2024
jeu. févr. 15 17:49:01 CET 2024
Starter(49194): Return code=0
Starter end(49194)

Do you have some ideas?
Best,
Guillaume

uwe

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 560
  • Karma: +0/-0
Re: ricc2 dipole moment calculation problem for MPI+amd nodes
« Reply #1 on: February 20, 2024, 05:03:03 PM »
Hello,

if the number of processes is large but the input small, some MPI processes will not get any tasks to do in some parts of the code. That might result in messaging problems. It should of course not happen, so it is good to get a bug report.
 
Depending on how the parallelization is done, the limiting factor for size of the input could be something like the number of occupied orbitals - which is 9 in your HCl case. You tried to run the job on 33 cores which is most likely too much for a two-atom input where the serial version just runs a couple of seconds.

Your input runs fine for me on 2 cores in parallel using MPI, but I'd recommend to try a larger input for testing.

Best Regards

glebreto

  • Newbie
  • *
  • Posts: 2
  • Karma: +0/-0
Re: ricc2 dipole moment calculation problem for MPI+amd nodes
« Reply #2 on: February 21, 2024, 09:10:19 AM »
Hello,

Thanks for your suggestions. How should I run on 2 cores in parallel using MPI in a machine that has 32 cores? In the above input, I set:

export PARNODES=2
export OMP_NUM_THREADS=2

and I do not have 2 processes starting but 33.
 

I also tried on Anliline or on this molecule:

$coord
   -8.31772213412044      2.35528661159307      0.00000566917837       c
   -5.74708611281726      1.68082068100064      0.00002456643962       c
   -5.02669605794417     -0.91132420305303      0.00001322808287       c
   -6.84130556911606     -2.84734483827988     -0.00004346370087       c
   -9.35925035688630     -2.15343929504342     -0.00006236096211       c
  -10.07869932814933      0.42537357120101     -0.00002834589187       c
   -2.35207029772896     -0.84979094098296      0.00002267671350       c
   -1.73118377194804      1.77591925849631     -0.00000377945225       n
   -3.71886196051766      3.29465057088330      0.00003401507024       n
   -0.61420256447812     -2.62628089676891      0.00004913287924       n
    1.73137274456050     -1.77546383450027      0.00001133835675       n
    2.35210053334695      0.84903127108086     -0.00000188972612       c
    0.61405138638815      2.62631302211303      0.00003968424862       n
    5.02651086478396      0.91116546605856      0.00007180959274       c
    5.74706154637764     -1.68116650088145      0.00005858150986       c
    8.31808685126249     -2.35518456638234      0.00002456643962       c
   10.07865208499621     -0.42515436297056     -0.00005291233149       c
    9.35889319864875      2.15379267382873     -0.00006803014049       c
    6.84105045608924      2.84742798622936     -0.00004157397474       c
    3.71926825163446     -3.29486977911375     -0.00000566917837       n
  -10.81255368166037     -3.59482033769723     -0.00009826575848       h
   -6.27814639699021     -4.81502594499871     -0.00006614041436       h
   -8.87810151911696      4.32341369367732      0.00003212534412       h
    6.27756436134383      4.81503161417709     -0.00006614041436       h
   10.81221731041019      3.59517182675641     -0.00013606028097       h
    8.87871190065522     -4.32324739777835      0.00004913287924       h
   12.07361327805014     -0.89154632943269     -0.00008692740173       h
  -12.07358493215827      0.89215293151870     -0.00003590479637       h
$end

and get similar issues.

If it may help, in the meantime I also had memory issues while performing CC2/ADC2 geometry optimization at the same step of the calculation: while calling the CC density Module using MPI. Note that it worked fine in SMP.

Best,
Guillaume



uwe

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 560
  • Karma: +0/-0
Re: ricc2 dipole moment calculation problem for MPI+amd nodes
« Reply #3 on: March 08, 2024, 07:53:37 PM »
Hello,

Christof found out what the problem is: The MPI version that is used in newer Turbomole releases is too new. Intel MPI 2019/2021 reduced the maximum allowed number for MPI tags (that's an implementation detail and can only be changed when building ricc2, not during runtime).

Turbomole 7.5.1 came with an older Intel MPI version which still works. So as a workaround, if you have older Turbomole releases available, use 7.5.1 (or older) for this job. Alternative is of course to use the latest version but SMP and not MPI.

A fix will be available in one of the next official releases of Turbomole.

Sorry for the inconvenience...