TURBOMOLE Users Forum

Installation and usage of TURBOMOLE => Parallel Runs => Topic started by: 175116 on January 31, 2012, 04:36:11 PM

Title: Problem with GA
Post by: 175116 on January 31, 2012, 04:36:11 PM
Hello,

I've got a fuzzy problem with parallel global arrays implementation of ridft. I have several identical structures which differ in conformation only. I prepare control file using define and the following input:

Code: [Select]
define << EOF


a coord
i
idef
f tors 1 2 4 5
f tors 2 4 5 31
f tors 4 5 31 34
f tors 5 31 34 35
f tors 31 34 35 36
f tors 34 35 36 38
f tors 35 36 38 39
f tors 3 7 8 18
f tors 37 41 42 51
f tors 64 65 67 68
f tors 65 67 68 94
f tors 67 68 94 97
f tors 68 94 97 98
f tors 94 97 98 99
f tors 97 98 99 101
f tors 98 99 101 102
f tors 66 70 71 81
f tors 100 104 105 114



ired
*
b all def-SVP
*
eht

-2

dft
on
func pbe
grid
m4

ri
on
m1200

scf
iter
150
conv
7

q
EOF

cosmoprep << EOF
78.5








r "h" 1.300
r "c" 2.000
r "o" 1.720
r "p" 2.106
r "n" 1.830
*

EOF

cat control | grep -iv "\$end" > control_temp

/bin/cat >> control_temp <<EOF
\$disp3 -func pbe -bj -grad
\$end
EOF


Then if I run sequential or parallel (MPI) ridft, no problem occurs. However if I use GA implementation of ridft (which I would like to due to noticeable speedup compared to MPI), SOME of the calculations instantly fail with the following status (content of the job.1 file) but some do not.

Code: [Select]
OPTIMIZATION CYCLE 1
Tue Jan 31 16:06:20 CET 2012
STARTING rdgrad ON 4 PROCESSORS!
RUNNING PROGRAM /usr/local/programs/turbomole/turbomole-6.3/arch/all/TURBOMOLE/bin/em64t-unknown-linux-gnu_ga/rdgrad_mpi.
PLEASE WAIT UNTIL rdgrad HAS FINISHED.
Look for the output in slave1.output.
MACHINEFILE is /scratch/tmp/36123.1.tq-8-2/machines
 <<<<<<<<<<<<<<< OUTPUT FROM PROCESS                      0 >>>>>>>>>>>>>>>
 distribution of control by ridft_mpi/rdgrad_mpi
 operating system is UNIX !
 hostname is         t05

 data group $actual step is not empty
 due to the abend of ridft


 check reason for abend ...

 use the command  'actual -r'  to get rid of that

 quit: process                      0  failing ...
 MODTRACE: no modules on stack

  CONTRL dead = actual step
 rdgrad ended abnormally
error in gradient step (1)

Technical nodes: I use TM 6.3, the systems under study contain approximately 120 atoms. In the control file, I set 1200MB for RI (within define) and I assign (via PBS system) 1900MB for the whole job per one core. I use 4 cores on one SMP node. What is really strange to me is why some calculations fail and some don't even though the define input as well as input structures are the same for all systems. There is no such problem with MPI.

Thank you for your comments.