Author Topic: which: no prsh in  (Read 10023 times)

segalemb

  • Newbie
  • *
  • Posts: 5
  • Karma: +0/-0
which: no prsh in
« on: February 17, 2012, 06:44:23 PM »
I recently moved to a SGI Altix 370 cluster with 4 nodes, with SUSE SLES11 SP1. When I tried to run a parallel job, for example ridft, the result was:

which: no prsh in (/home/sergio/usr/orca:/home/usr/topmod09/bin:/home/usr/Q-Chem/bin:/home/usr/Q-Chem/exe:/home/usr/Q-Chem/util:/home/usr/pgi/linux86-64/10.6/bin:/home/usr/nbo59:/home/usr/TURBOMOLE/bin/em64t-unknown-linux-gnu_mpi:/home/usr
/TURBOMOLE/scripts:/home/usr/TURBOMOLE/bin/em64t-unknown-linux-gnu:/sbin:/bin:/usr/sbin:/usr/bin:/usr/games:/usr/local/sbin:/usr/local/bin:/usr/X11R6/bin:/home/sergio/bin:/opt/kde/bin:/usr/orca:/usr/openmpi-1.4.2/bin:/home/usr/g09/bsd:/hom
e/usr/g09/local:/home/usr/g09/extras:/home/usr/g09)
STARTING ridft ON 4 PROCESSORS!
RUNNING PROGRAM /home/usr/TURBOMOLE/bin/em64t-unknown-linux-gnu_mpi/ridft_mpi.
PLEASE WAIT UNTIL ridft HAS FINISHED.
Look for the output in slave1.output.
MACHINEFILE is /home/usr/TURBOMOLE/hosts
ndif: Command not found.
nv: Command not found.
ndif: Command not found.
Badly placed ()'s.
ndif: Command not found.
nv: Command not found.
ndif: Command not found.
Badly placed ()'s.
ndif: Command not found.
nv: Command not found.
ndif: Command not found.
Badly placed ()'s.
ndif: Command not found.
nv: Command not found.
ndif: Command not found.
Badly placed ()'s.
ridft_mpi: Rank 0:4: MPI_Init: didn't find active interface/port
ridft_mpi: Rank 0:4: MPI_Init: Can't initialize RDMA device
ridft_mpi: Rank 0:4: MPI_Init: MPI BUG: Cannot initialize RDMA protocol
MPI Application rank 3 killed before MPI_Init() with signal 15
MPI Application rank 4 exited before MPI_Init() with status 1
forrtl: error (78): process killed (SIGTERM)
MPI Application rank 2 exited before MPI_Init() with status 1
forrtl: error (78): process killed (SIGTERM)
forrtl: error (78): process killed (SIGTERM)
MPI Application rank 1 exited before MPI_Init() with status 1
No file slave1.output found?

      Thanks,

                  Sergio   

uwe

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 560
  • Karma: +0/-0
Re: which: no prsh in
« Reply #1 on: February 20, 2012, 10:50:04 AM »
Hi Sergio,

whenever Turbomole fails to run, please ask the Turbomole support for help.

Isn't SGI Altix 370 an Itanium system? Or is it an Altix XE or ICE ?

It seems that Turbomole's sysname script found an EM64T/X86_64 system - I know that Itanium CPUs can emulate x86 code, so that the binaries would run (just very very slowly), but as far as I know they are not able to emulate x86_64 bit code.

Assuming that this is a Xeon-based system:

First of all, the prsh warning indicates that you are running an older Turbomole version. First thing I'd recommend to do is to install and use the latest release (which also includes a newer PlatformMPI version).

Turbomole uses  PlatformMPI, and usually you do not have to change or set anything for its usage. Unless you tried to start the binaries yourself using another MPI version, the Turbomole own scripts will set up everything.

Two things are unusual:

Quote
ndif: Command not found.
nv: Command not found.
ndif: Command not found.
Badly placed ()'s.

neither ndif nor nv are commands which are called from the scripts and also not from PlatformMPI. No idea where those errors come from.

Quote
ridft_mpi: Rank 0:4: MPI_Init: didn't find active interface/port
ridft_mpi: Rank 0:4: MPI_Init: Can't initialize RDMA device
ridft_mpi: Rank 0:4: MPI_Init: MPI BUG: Cannot initialize RDMA protocol

MPI did not find any active interconnect but assumes to have an Infiniband network. On the SGI ICE and UV systems that I know, the default Turbomole version with PlatformMPI runs without problems, but there SGI's MPT (message passing toolkit) of SGI ProPack is installed. You could try to use TCP/IP instead to check if this is just a problem with the Infiniband drivers.

Regards,

Uwe