Author Topic: problems with parallel TURBOMOLE 5.10 on x86_64  (Read 7975 times)

alexk

  • Newbie
  • *
  • Posts: 2
  • Karma: +0/-0
problems with parallel TURBOMOLE 5.10 on x86_64
« on: July 30, 2008, 10:51:15 PM »

I am trying to run a parallel ridft on a dual socket dual core 64 bit Opteron node running SUSE 10.1 (4 processors total). What happens is that either a single processor is occupied, or sometimes two are shown as occupied, and this is despite the fact that I ask for 3 worker processors. The wall clock time is the same as for a standard sequential run. Here is the script I use,

export PARA_ARCH=MPI
export PARNODES=3
export TURBODIR=/libs/TURBOMOLE_5.10
export MPIRUNPATH="$TURBODIR/mpirun_scripts/MPICH2"
export MPICHVER="MPICH2"
export PATH=$PATH:$TURBODIR/bin/x86_64-unknown-linux-gnu:$TURBODIR/bin/x86_64-unknown-linux-gnu_mpi:$TURBODIR/scripts:$TURBODIR/mpirun_scripts
ridft > outfile

Any ideas about what's missing or misassigned? Thanks in advance,

Alex

Arnim

  • Developers
  • Sr. Member
  • *
  • Posts: 253
  • Karma: +0/-0
Re: problems with parallel TURBOMOLE 5.10 on x86_64
« Reply #1 on: July 31, 2008, 02:09:20 PM »
Hi,

in the line 'export PATH...' the serial binaries are written before the parallel.
If you switch it around, it should work. Therefore it is best to let sysname do this
for you.
This will do the trick:
export PARNODES=3
export PARA_ARCH=MPI
export TURBODIR=<my-path>
export PATH=$TURBODIR/bin/scripts:$PATH
export PATH=$TURBODIR/bin/`sysname`:$PATH

MPICH2 is not used. It works with HP-MPI, you don't need to set anything there.

It is also good to set a tmpdir with:
export TURBOTMPDIR=<my-path>
« Last Edit: July 31, 2008, 04:22:07 PM by Arnim »

alexk

  • Newbie
  • *
  • Posts: 2
  • Karma: +0/-0
Re: problems with parallel TURBOMOLE 5.10 on x86_64
« Reply #2 on: August 22, 2008, 08:01:50 PM »
Thanks Arnim,

I am now testing parallel batch jobs with the Maui TORQUE scheduler and running into problems. Here is one example of a failed job with the following error in the outfile

---------------------------------------------------------------------------------------------------
STARTING ridft ON 3 PROCESSORS!
RUNNING PROGRAM
/libs/TURBOMOLE_5.10/bin/x86_64-unknown-linux-gnu_mpi/ridft_mpi.
PLEASE WAIT UNTIL ridft HAS FINISHED.
Look for the output in slave1.output.
MACHINEFILE is /var/torque/aux//7145
No file slave1.output found?
---------------------------------------------------------------------------------------------------

The SDERR file contains

        MPI Application rank 0 exited before MPI_Finalize() with status 16


I suppose it can be related to the way TORQUE assigns "nodefile" or "machinefile". If anyone's familiar with such kind of problems, please let me know. Thanks in advance,

Alex