Dear all,
I am new at Turbomol and I am trying to perform ADC(2) or CC2 excited state calculation. I run on 2 architectures: x86_64 and em64t. The excitation energies are computed using the same input with em64t and MPI or SMP as the oscillator strength. Using x86_64 , the excitation energies are computed with the MPI or SMP, but only the SMP works when I ask for the oscillator strength. The MPI lead to memory issues, which is strange since it is only a small molecule with quite a large RAM available (128G).
Here is the inputs:
coord:
$coord natoms= 2
0.00000000000000 0.00000000000000 -0.02489783 cl
0.00000000000000 0.00000000000000 2.38483140 h
$user-defined bonds
$end
control:
$coord file=coord
$atoms
basis =cc-pVDZ
cbasis =cc-pVDZ
$symmetry c1
$denconv 1.d-8
$eht charge=0 unpaired=0
$ricc2
adc(2)
maxiter = 100
mxdiis = 50
conv=8
iprint=5
$excitations
irrep=a multiplicity=1 nexc=4
spectrum states=all operators=diplen
maxiter = 100
mxdiis = 50
conv=8
$freeze
defcore
$maxcor 70000 mib per_node
$end
and the submission file:
#!/bin/ksh
#$ -N turbomol
#$ -q batch
#$ -pe dmp* 32
#$ -l vendor=amd
module purge
export TURBODIR=/work/shared/icmub/TurboMole/TmoleX2024/TURBOMOLE
export PATH=$TURBODIR/scripts:$PATH
export PARA_ARCH=MPI
export PATH=$TURBODIR/bin/`sysname`:$PATH
tmpdir='/tmp3/'${JOB_ID}'/TMP'
mkdir -p $tmpdir
export TURBOTMPDIR=$tmpdir
export PARNODES=2
export OMP_NUM_THREADS=2
echo $NSLOTS
ulimit -a
date
dscf &> dscf.out
ricc2 &> ricc2.out
date
and the beginning and end of the ricc2.out :
tmpdir in control file set to "/tmp3/2402/TMP".
This directory must exist and be writable by the master process (slave1).
STARTING ricc2 VIA YOUR QUEUING SYSTEM!
RUNNING PROGRAM /work/shared/icmub/TurboMole/TmoleX2024/TURBOMOLE/bin/x86_64-unknown-linux-gnu_mpi/ricc2_mpi.
/work/shared/icmub/TurboMole/TmoleX2024/TURBOMOLE/mpirun_scripts/IMPI/intel64/bin/mpirun -machinefile NodeFile.50523 -genv OMP_NUM_THREADS=2 -genv TURBODIR=/work/shared/icmub/TurboMole/TmoleX2024/TURBOMOLE -genv I_MPI_PIN=off -genv OMP_STACK_SIZE=256M -genv LD_LIBRARY_PATH=/beegfs/data/work/shared/icmub/TurboMole/TmoleX2024/TURBOMOLE/mpirun_scripts/IMPI/intel64//libfabric/lib:/beegfs/data/work/shared/icmub/TurboMole/TmoleX2024/TURBOMOLE/mpirun_scripts/IMPI/intel64//lib/release:/beegfs/data/work/shared/icmub/TurboMole/TmoleX2024/TURBOMOLE/mpirun_scripts/IMPI/intel64//lib:/work/shared/icmub/TurboMole/TmoleX2024/TURBOMOLE/libso/x86_64-unknown-linux-gnu_mpi /work/shared/icmub/TurboMole/TmoleX2024/TURBOMOLE/bin/x86_64-unknown-linux-gnu_mpi/ricc2_mpi
this is node-proc. number 1 running on node part064.u-bourgogne.fr
the total number of node-proc. spawned is 33
parallel platform: MPP or cluster with fast interconnect
OpenMP run-time library returned nthreads = 2
Program not compiled with OMP parallelization
... only 1 thread can used... 0 2
ricc2 (part064.u-bourgogne.fr) : TURBOMOLE rev. V7-8 compiled 22 Nov 2023 at 12:25:37
Copyright (C) 2023 TURBOMOLE GmbH, Karlsruhe
2024-02-15 17:48:55.656
R I C C 2 - PROGRAM
the quantum chemistry groups
at the universities in
Karlsruhe & Bochum
Germany
-------------------------------
caled vector with: 0.983513025232158
renormalized left eigenvector 2
overlap (left|right): 1.0338E+00
scaled vector with: 9.8351E-01
norm of right eigenvector: 1.00056976448406 1.01702333053144
scaled vector with: 0.983261612570339
renormalized left eigenvector 3
overlap (left|right): 1.0343E+00
scaled vector with: 9.8326E-01
norm of right eigenvector: 1.00065470833162 1.01825960571866
scaled vector with: 0.982067828659689
renormalized left eigenvector 4
overlap (left|right): 1.0369E+00
scaled vector with: 9.8207E-01
The semi-canonical algorithm will be used for densities
======== CC DENSITY MODULE ========
current wave-function model: ADC(2)
calculating 4 xi densities
a semicanonical algorithm will be used when possible
density nr. cpu/min wall/min L R
------------------------------------------------------
total memory allocated in ccn5den1: 1 Mbyte
number of batches in I-loop: 2
memory allocated per RI-intermediate in I-loop: 1 MByte
memory allocated per RI-intermediate in j-loop: 1 MByte
total memory allocated in cc_ybcont: 1 Mbyte
time in cc_ybcont cpu: 0.00 sec wall: 0.00 sec ratio: 1.0
-----
total memory allocated in ccn5den1: 1 Mbyte
number of batches in I-loop: 2
memory allocated per RI-intermediate in I-loop: 1 MByte
memory allocated per RI-intermediate in j-loop: 1 MByte
total memory allocated in cc_ybcont: 1 Mbyte
time in cc_ybcont cpu: 0.00 sec wall: 0.00 sec ratio: 1.0
number of batches in I-loop: 2
memory allocated per RI-intermediate in I-loop: 1 MByte
memory allocated per RI-intermediate in j-loop: 1 MByte
total memory allocated in cc_ybcont: 1 Mbyte
time in cc_ybcont cpu: 0.00 sec wall: 0.00 sec ratio: 1.0
number of batches in I-loop: 2
memory allocated per RI-intermediate in I-loop: 1 MByte
memory allocated per RI-intermediate in j-loop: 1 MByte
total memory allocated in cc_ybcont: 1 Mbyte
time in cc_ybcont cpu: 0.00 sec wall: 0.00 sec ratio: 1.0
2 0.00 0.00 LE0 R0
total memory allocated in ccn5den1: 1 Mbyte
Abort(403292676) on node 9 (rank 8 in comm 496): Fatal error in PMPI_Recv: Invalid tag, error stack:
PMPI_Recv(173): MPI_Recv(buf=0x2b6faeb1f7c0, count=1260, dtype=0x4c000829, src=1, tag=1048577, comm=0x84000002, status=0x7ffe3587d280) failed
PMPI_Recv(105): Invalid tag, value is 1048577
Abort(269074948) on node 10 (rank 9 in comm 496): Fatal error in PMPI_Recv: Invalid tag, error stack:
PMPI_Recv(173): MPI_Recv(buf=0x2b58b7625b40, count=1260, dtype=0x4c000829, src=1, tag=1048577, comm=0x84000002, status=0x7ffccfe71c80) failed
PMPI_Recv(105): Invalid tag, value is 1048577
-----
PMPI_Recv(105): Invalid tag, value is 1048577
Abort(805945860) on node 13 (rank 12 in comm 496): Fatal error in PMPI_Recv: Invalid tag, error stack:
PMPI_Recv(173): MPI_Recv(buf=0x2b4fef781840, count=1260, dtype=0x4c000829, src=1, tag=1048577, comm=0x84000002, status=0x7ffe3cb10f00) failed
PMPI_Recv(105): Invalid tag, value is 1048577
Abort(671728132) on node 18 (rank 17 in comm 496): Fatal error in PMPI_Recv: Invalid tag, error stack:
PMPI_Recv(173): MPI_Recv(buf=0x2b39ec0dc1c0, count=1260, dtype=0x4c000829, src=1, tag=1048577, comm=0x84000002, status=0x7ffd404f0200) failed
PMPI_Recv(105): Invalid tag, value is 1048577
Abort(671728132) on node 29 (rank 28 in comm 496): Fatal error in PMPI_Recv: Invalid tag, error stack:
PMPI_Recv(173): MPI_Recv(buf=0x2b634cd4b7c0, count=1260, dtype=0x4c000829, src=1, tag=1048577, comm=0x84000002, status=0x7fffa1659c00) failed
PMPI_Recv(105): Invalid tag, value is 1048577
===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= RANK 0 PID 50712 RUNNING AT part064.u-bourgogne.fr
= KILLED BY SIGNAL: 9 (Killed)
===================================================================================
Here is the output of the submission script:
Prologue begin
Starter begin : part064.u-bourgogne.fr(49194)
jeu. févr. 15 17:48:43 CET 2024
Version CentOS : 7.7
Starter(49194): PATH=/usr/ccub/sge/scripts:/tmp3/2402.1.batch:/work/shared/icmub/TurboMole/TmoleX2024/TURBOMOLE/bin/em64t-unknown-linux-gnu:/work/shared/icmub/TurboMole/TmoleX2024/TURBOMOLE/scripts:/soft/c7/gv/6.1.1/gv:/soft/c7/spack/0.18.0/packages/linux-centos7-haswell/gcc/11.2.0/gcc/4.8.5/g75x5bhqcqxorvp32f6vs2h3e4vb7tpm/bin:/usr/lib64/qt-3.3/bin:/soft/c7/modules/4.1.2/bin:/usr/ccub/sge-8.1.8/bin:/usr/ccub/sge-8.1.8/bin/lx-amd64:/user1/icmub/gu9875le/bin:/bin:/usr/bin:/usr/sbin:/etc:/usr/ccub/bin:/usr/local/bin:/user1/icmub/gu9875le/bin:.:/work/shared/icmub/bin:/soft/c7/gaussian/16avx2/g16/bsd:/soft/c7/gaussian/16avx2/g16/local:/soft/c7/gaussian/16avx2/g16/extras:/soft/c7/gaussian/16avx2/g16
Starter exec(49194) : '/usr/ccub/sge-8.1.8/ccub/spool/part064/job_scripts/2402'
32
time(cpu-seconds) unlimited
file(blocks) unlimited
coredump(blocks) unlimited
data(KiB) unlimited
stack(KiB) unlimited
lockedmem(KiB) unlimited
nofiles(descriptors) 1024
processes unlimited
flocks unlimited
sigpending 513331
msgqueue(bytes) 819200
maxnice 0
maxrtprio 0
address-space(KiB) unlimited
jeu. févr. 15 17:48:44 CET 2024
jeu. févr. 15 17:49:01 CET 2024
Starter(49194): Return code=0
Starter end(49194)
Do you have some ideas?
Best,
Guillaume