Hi Chris,
well, here we go, first post for the new parallel version :-)
The first and often quite difficult step is to find out what the initial reason for an abort has been. Since all processes will give error messages sooner or later, it is not easy to find out which line is the one that came first.
In your case, I assume that the reason is a memory problem:
MPI Application rank 3 exited before MPI_Finalize() with status 11
signal 11 is segmentation fault, so please check for:
- stack size limit
- shared memory limits, e.g. /proc/sys/kernel/shmall and /proc/sys/kernel/shmmax
- shared memory that is still allocated: try ipcs and if there are shared memory segments unattached to a process, ipcrm -m <id>
Then, if all that looks good, I would suggest to:
- run on 2 CPUs only
- and set $ricore to 0
to test if ridft does run at all for your input.
The shared memory is used for several arrays (density, fock, orbitals,...) and it is limited to a certain size to avoid excessive memory usage (and swapping). $ricore does not speed up the calculation that much, especially if you are using $marij (which should be switched on by default in all cases).
Regards,
Uwe