We have encountered some strange crashes while running ricc2 in parallel (TURBOMOLE version 5.9, architechture em64t). I'm attaching output from one such case. This system had about 1800 basis functions in C1 symmetry. It was run on a 4-processor workstation with "$numprocs 4".
Here is the output before crash:
======== CC DENSITY MODULE ========
current wave-function model: MP2
calculating CC ground state density
a semicanonical algorithm will be used
density nr. cpu/min wall/min L R
------------------------------------------------------
rank 0 in job 1 tremaine.joensuu.fi_33101 caused collective abort of all ranks
exit status of rank 0: return code 1
Program ricc2_mpi has ended.
Shutting down unused mpd ring.
Here is data from stderr (batch system log):
[cli_0]: aborting job:
Fatal error in MPI_Sendrecv: Invalid count, error stack:
MPI_Sendrecv(217): MPI_Sendrecv(sbuf=0x2ac9879010, scount=-99655065, dtype=0x4c000829, dest=0, stag=88, rbuf=0x2acd7fa010, rcount=-99655065, dtype=0x4c000829, src=0, rtag=88, MPI_COMM_WORLD, status=0x142f210) failed
MPI_Sendrecv(108): Negative count, value is -99655065
What might cause this error? dscf ran fine in parallel, and so did ricc2 until the CC density module.
By the way, I noticed that the control file included parameter "$parallel_platform cluster". It seems that in all the mpirun_scripts the em64t architechture is given the value "$parallel_platform cluster". I guess "$parallel_platform MPP" would be more appropriate? Can the value of $parallel_platform cause problems in ricc2?
Update: A similar calculation that employed Ci-symmetry did not crash like the C1-symmetric calculations. So now we will try if our calculations will work better when the simplified C1 algorithm is turned off.
Update2: OK, by turning off the simplified C1 algorithm ricc2 could calculate the MP2 energy, but it still crashed in "LINEAR CC RESPONSE SOLVER". It gave similar MPI error message (Fatal error in MPI_Sendrecv).