Hello,
I`m having the same problem with parallel DFT (without RI). I can successfully run smaller parallel jobs with the same basis set, functional and scheduler script. This one is rather large and I cannot run it in serial mode to check if it´s somehow a problem of the job itself. The limits on the nodes look alright:
core file size (blocks, -c) unlimited
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 65536
max locked memory (kbytes, -l) 32
max memory size (kbytes, -m) unlimited
open files (-n) 1024
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) unlimited
cpu time (seconds, -t) unlimited
max user processes (-u) 65536
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
The system is running on Rocks 5.1 (basically a clone of RHEL5.1), using gridengine 6.1 as scheduler, Kernel 2.6.18-92.1.13.el5.
TM Version is 5.10.
------------------
tail job.last:
scf.post 9.1 0.01 9.1 0.01
dscf.postscf 129350.6 100.00 1232845849.4 ******
fine, there is no data group "$actual step"
next step = grad
------------------
tail job.1:
cannot find any information which may be used to optimize geometry ...
MODTRACE: no modules on stack
so long GRANAT !
relax ended abnormally
relax step ended abnormally
next step = relax
------------------
I´m running with MPI flags to reduce CPU load of the server process: "-e MPI_FLAGS=y0 -np 1" (there is some other thread about this)
The system is rather large:
total number of primitive shells : 265
total number of contracted shells : 762
total number of cartesian basis functions : 2893
total number of SCF-basis functions : 2438
I would be happy to supply further info or files if that helps. Any ideas ?