Hello
I've installed turbomole 5.9.1 on our rocks cluster and the serial version works quit fine.
After setting the environment variable 'export PATH=$TURBODIR/mpirun_scripts/MPICH2:$PATH' to get not in conflict with the preinstalled MPICH2 version, TM also works parallel on 2 CPUs (1 node).
To use more CPUs I've set 'export PARNODES=8' and 'export HOSTS_FILE=hostsfile' and also generated the 'mpd.hosts' file in my home directory. By the way ssh and rsh to the compute nodes is possible without password request. Now, when I start turbomole, the following error message occur:
convgrep will be taken out of the TURBODIR directory
ridft ended abnormally
[cli_1]: aborting job:
application called MPI_Abort(MPI_COMM_WORLD, -16) - process 1
ridft ended abnormally
[cli_8]: aborting job:
application called MPI_Abort(MPI_COMM_WORLD, -16) - process 8
OPTIMIZATION CYCLE 1
[cli_5]: aborting job:
Fatal error in MPI_Bcast: Other MPI error, error stack:
MPI_Bcast(795)............................: MPI_Bcast(buf=0xbfffdf68, count=1, dtype=0x4c000430, root=0, comm=0x84000000) failed
MPIR_Bcast(193)...........................:
MPIC_Recv(98).............................:
MPIC_Wait(324)............................:
MPIDI_CH3_Progress_wait(217)..............: an error occurred while handling an event returned by MPIDU_Sock_Wait()
MPIDI_CH3I_Progress_handle_sock_event(415):
MPIDU_Socki_handle_read(670)..............: connection failure (set=0,sock=6,errno=104:Connection reset by peer)
[cli_2]: aborting job:
Fatal error in MPI_Bcast: Other MPI error, error stack:
MPI_Bcast(795)............................: MPI_Bcast(buf=0xbfffdf68, count=1, dtype=0x4c000430, root=0, comm=0x84000000) failed
MPIR_Bcast(193)...........................:
MPIC_Recv(98).............................:
MPIC_Wait(324)............................:
MPIDI_CH3_Progress_wait(217)..............: an error occurred wh rdgrad ended abnormally
[cli_1]: aborting job:
application called MPI_Abort(MPI_COMM_WORLD, -16) - process 1
ile handling an event returned by MPIDU_Sock_Wait()
MPIDI_CH3I_Progress_handle_sock_event(415):
MPIDU_Socki_handle_read(670)..............: connection failure (set=0,sock=3,errno=104:Connection reset by peer)
[cli_3]: aborting job:
Fatal error in MPI_Bcast: Other MPI error, error stack:
MPI_Bcast(795)............................: MPI_Bcast(buf=0xbfffdf68, count=1, dtype=0x4c000430, root=0, comm=0x84000000) failed
MPIR_Bcast(193)...........................:
MPIC_Recv(98).............................:
MPIC_Wait(324)............................:
MPIDI_CH3_Progress_wait(217)..............: an error occurred while handling an event returned by MPIDU_Sock_Wait()
MPIDI_CH3I_Progress_handle_sock_event(415):
MPIDU_Socki_handle_read(670)..............: connection failure (set=0,sock=3,errno=104:Connection reset by peer)
Does anybody knows how I can solve this problem?
Thanks
Daniel