TURBOMOLE Users Forum

Installation and usage of TURBOMOLE => Parallel Runs => Topic started by: martijn on July 20, 2012, 02:58:02 PM

Title: Re: Difference in MPI runs between 6.2 and 6.3 (or TM on Unubtu 12.04)
Post by: martijn on July 20, 2012, 02:58:02 PM
Hi,

We have an older 8 core workstation (running Ubuntu) with TM version 6.2 installed on it. On this workstation the following simple script is sufficient to run a MPI TM job (e.g. jobex):

#!/bin/bash

# setting up the environment for turbomole
export TURBODIR=/home/martijn/Turbomole/TURBOMOLE/
export PARA_ARCH=MPI
source $TURBODIR/Config_turbo_env
export PARNODES=3

jobex -gcart  5 -c 100

Now we have just installed TM 6.3.1 on a newer 12 core workstation (also running Ubuntu) and there the simple script from above doesn't work. The MPI job never gets started properly and essentially hangs after dscf.stats.parallel. We've found that the following script seems to work:

#!/bin/bash

# setting up the environment for turbomole

echo "localhost">>machinefile
echo "localhost">>machinefile


export TURBODIR=/home/cris/software/TURBOMOLE
source $TURBODIR/Config_turbo_env
export PATH=$TURBODIR/scripts:$PATH
export PATH=$TURBODIR/bin/`sysname`:$PATH

export PARA_ARCH=MPI

export PARANODES=2

export HOSTS_FILE="machinefile"

jobex -gcart 4 -c 100

But now we need a machinefile, something we never needed on the old workstation using TM 6.2. We also need to set a path so that TM finds HP-MPI and not openmpi instead (again something that never was an issue on the old workstation). We did this by editing /etc/lib.so.conf to:

include /etc/ld.so.conf.d/*.conf
/home/cris/software/TURBOMOLE/mpirun_scripts/em64t-unknown-linux-gnu_mpi/HPMPI/MPICH2.0/lib/linux_amd64/
/home/cris/software/TURBOMOLE/mpirun_scripts/em64t-unknown-linux-gnu_mpi/HPMPI/lib/linux_amd64/
/home/cris/software/TURBOMOLE/libso/em64t-unknown-linux-gnu_mpi/

So somehow the sourcing of the login-file for TM does not seem to work properly.  

While this now runs, sometimes MPI jobs die with MPI errors. Something again we never saw with 6.2 on the old workstation.

Did we do something wrong or could this be a bug/feature related to the new HP-MPI library that came in with version 6.3?

Thanks in advance,

Martijn
Title: Re: Difference in MPI runs between 6.2 and 6.3
Post by: uwe on July 23, 2012, 01:51:53 PM
Hello,

there is no difference between Turbomole 6.2 and 6.3 when starting the parallel version. Even the MPI version is the same, namely:

Platform MPI 07.01.00.00 [8144] Linux x86-64

You do not have to set a machine file if you run on one single node, and I would highly recommend not to add any Turbomole related shared libraries to the default search path of the shared library loader (too dangerous, it could disable other programs on your system).

It is more likely that the newer Ubuntu version has some settings which prevent Turbomole to run. Did you try to run 6.3 on the older system and 6.2 on the newer one?

Adding a machine file which contains just 'localhost' might help in cases where MPI does not resolve the name of the node correctly. Is the hostname defined in /etc/hosts, and is it a local one (127.0.0.X) or does it has an IP address of your local network?

Does the job run on a local disk or on a network disk?

Regards,

Uwe
Title: Re: Difference in MPI runs between 6.2 and 6.3
Post by: martijn on July 24, 2012, 05:47:25 PM
Hi,

Okay. It looks that this is indeed an Ubuntu issue (the old machine runs Untuntu 10.04 while the new machine runs Ubuntu 12.04). I've tried to run TM 6.2 (after removing the library paths) and get the same problems as with 6.3. More specifically the standard output is:

convgrep will be taken out of the TURBODIR directory
dscf ended abnormally
dscf ended abnormally
dscf ended abnormally
MPI Application rank 1 exited before MPI_Finalize() with status 13
OPTIMIZATION CYCLE 1
grad ended abnormally
grad ended abnormally
grad ended abnormally
MPI Application rank 0 exited before MPI_Finalize() with status 13
 statpt ended normally
dscf ended abnormally
dscf ended abnormally
dscf ended abnormally
MPI Application rank 1 exited before MPI_Finalize() with status 13
OPTIMIZATION CYCLE 2
grad ended abnormally
grad ended abnormally
grad ended abnormally


The misunderstanding about there being a different MPI version with 6.3 instead of 6.2 arose from the fact that in the version history for 6.3 it said that "Platform MPI 7.1 is included in the Turbomole distribution".

Yes, localhost is defined in /etc/hosts as 127.0.0.1 and the calculation runs on the disk of the workstation (i.e. not nework mounted).

I'm wondering if anyone else is successfully running TM on Ubuntu 12.04.

Cheers,

Martijn
Title: Re: Difference in MPI runs between 6.2 and 6.3
Post by: uwe on July 25, 2012, 12:02:16 PM
Hello,

Turbomole 6.4 uses Platform MPI 8.2 and does run on Ubuntu 12 without any modifications of Turbomole.

The problem with the combination of Turbomole 6.3 and Ubuntu 12 seems to be that the environment variables like LD_LIBRARY_PATH and the path where the input is located are not exported to the client processes. Some strange Ubuntu-sshd-MPI misunderstanding.

That is also why setting an explicit machinefile helps, because in this case the Turbomole-own scripts do explicitly set the path to the shared libraries and the working directory.

So either install Turbomole 6.4, or ask the Turbomole support for a patch for 6.3

Regards,

Uwe
Title: Re: Difference in MPI runs between 6.2 and 6.3 (or TM on Unubtu 12.04)
Post by: martijn on July 25, 2012, 02:20:30 PM
Hi Uwe,

We'll download Turbomole 6.4 and give it a try.

Thanks,

Martijn
Title: Re: Difference in MPI runs between 6.2 and 6.3 (or TM on Unubtu 12.04)
Post by: martijn on August 19, 2012, 08:43:49 PM
Thanks! I am happy to report that using Turbomole 6.4 solved all problems.

Best,

Martijn