Hello,
I've got a fuzzy problem with parallel global arrays implementation of ridft. I have several identical structures which differ in conformation only. I prepare control file using define and the following input:
define << EOF
a coord
i
idef
f tors 1 2 4 5
f tors 2 4 5 31
f tors 4 5 31 34
f tors 5 31 34 35
f tors 31 34 35 36
f tors 34 35 36 38
f tors 35 36 38 39
f tors 3 7 8 18
f tors 37 41 42 51
f tors 64 65 67 68
f tors 65 67 68 94
f tors 67 68 94 97
f tors 68 94 97 98
f tors 94 97 98 99
f tors 97 98 99 101
f tors 98 99 101 102
f tors 66 70 71 81
f tors 100 104 105 114
ired
*
b all def-SVP
*
eht
-2
dft
on
func pbe
grid
m4
ri
on
m1200
scf
iter
150
conv
7
q
EOF
cosmoprep << EOF
78.5
r "h" 1.300
r "c" 2.000
r "o" 1.720
r "p" 2.106
r "n" 1.830
*
EOF
cat control | grep -iv "\$end" > control_temp
/bin/cat >> control_temp <<EOF
\$disp3 -func pbe -bj -grad
\$end
EOF
Then if I run sequential or parallel (MPI) ridft, no problem occurs. However if I use GA implementation of ridft (which I would like to due to noticeable speedup compared to MPI), SOME of the calculations instantly fail with the following status (content of the job.1 file) but some do not.
OPTIMIZATION CYCLE 1
Tue Jan 31 16:06:20 CET 2012
STARTING rdgrad ON 4 PROCESSORS!
RUNNING PROGRAM /usr/local/programs/turbomole/turbomole-6.3/arch/all/TURBOMOLE/bin/em64t-unknown-linux-gnu_ga/rdgrad_mpi.
PLEASE WAIT UNTIL rdgrad HAS FINISHED.
Look for the output in slave1.output.
MACHINEFILE is /scratch/tmp/36123.1.tq-8-2/machines
<<<<<<<<<<<<<<< OUTPUT FROM PROCESS 0 >>>>>>>>>>>>>>>
distribution of control by ridft_mpi/rdgrad_mpi
operating system is UNIX !
hostname is t05
data group $actual step is not empty
due to the abend of ridft
check reason for abend ...
use the command 'actual -r' to get rid of that
quit: process 0 failing ...
MODTRACE: no modules on stack
CONTRL dead = actual step
rdgrad ended abnormally
error in gradient step (1)
Technical nodes: I use TM 6.3, the systems under study contain approximately 120 atoms. In the control file, I set 1200MB for RI (within define) and I assign (via PBS system) 1900MB for the whole job per one core. I use 4 cores on one SMP node. What is really strange to me is why some calculations fail and some don't even though the define input as well as input structures are the same for all systems. There is no such problem with MPI.
Thank you for your comments.