Author Topic: How to do a parallel HF-involved DFT calculation?  (Read 10504 times)

sazabi

  • Newbie
  • *
  • Posts: 5
  • Karma: +0/-0
How to do a parallel HF-involved DFT calculation?
« on: May 13, 2010, 07:19:28 PM »
From the release information, Turbomole 6.1 should have a beta-quality of such calculation. But what I have is


[n198:30100] opal_os_dirpath_create: Error: Unable to create the sub-directory (/openmpi-sessions-xis19@n198_0) of (//openmpi-sessions-xis19@n198_0/59081/0/0), mkdir failed [1]
[n198:30100] [[59081,0],0] ORTE_ERROR_LOG: Error in file util/session_dir.c at line 106
[n198:30100] [[59081,0],0] ORTE_ERROR_LOG: Error in file util/session_dir.c at line 399
[n198:30100] [[59081,0],0] ORTE_ERROR_LOG: Error in file ess_hnp_module.c at line 304
--------------------------------------------------------------------------
It looks like orte_init failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems.  This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):

  orte_session_dir failed
  --> Returned value Error (-1) instead of ORTE_SUCCESS
--------------------------------------------------------------------------
[n198:30100] [[59081,0],0] ORTE_ERROR_LOG: Error in file runtime/orte_init.c at line 132
--------------------------------------------------------------------------
It looks like orte_init failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems.  This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):

  orte_ess_set_name failed
  --> Returned value Error (-1) instead of ORTE_SUCCESS
--------------------------------------------------------------------------
[n198:30100] [[59081,0],0] ORTE_ERROR_LOG: Error in file orterun.c at line 541


Any suggestions?

Thanks!

uwe

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 560
  • Karma: +0/-0
Re: How to do a parallel HF-involved DFT calculation?
« Reply #1 on: May 14, 2010, 03:56:16 PM »
Hi,

do you try to start the MPI binary with OpenMPI? Turbomole uses HP-MPI (now called Platform MPI 7), so I doubt that ridft_mpi can be started with the mpirun of OpenMPI.

If you have a usual Turbomole installation with correctly set $TURBODIR and the PATH to $TURBODIR/sciripts, then all you have to do is:

  • export PARA_ARCH=GA
  • export PARNODES=4

or a different number of CPUs you like to use, and run

$TURBODIR/bin/`sysname`/ridft

The ridft there is a script that sets all environment variables, the path to the shared libraries and the correct mpirun which is part of the Turbomole distribution.

I would also recommend to set $ricore to zero - at least on some of our systems with not too many memory it avoids problems with too much used shared memory, and $ricore would not speed up the calculation for hybrid functionals anyway. See also:

http://www.cosmologic.de/parallel-faq.html

Regards,

Uwe

sazabi

  • Newbie
  • *
  • Posts: 5
  • Karma: +0/-0
Re: How to do a parallel HF-involved DFT calculation?
« Reply #2 on: May 17, 2010, 06:49:11 AM »
Thank you very much! I will try HP-MPI.


Hi,

do you try to start the MPI binary with OpenMPI? Turbomole uses HP-MPI (now called Platform MPI 7), so I doubt that ridft_mpi can be started with the mpirun of OpenMPI.

If you have a usual Turbomole installation with correctly set $TURBODIR and the PATH to $TURBODIR/sciripts, then all you have to do is:

  • export PARA_ARCH=GA
  • export PARNODES=4

or a different number of CPUs you like to use, and run

$TURBODIR/bin/`sysname`/ridft

The ridft there is a script that sets all environment variables, the path to the shared libraries and the correct mpirun which is part of the Turbomole distribution.

I would also recommend to set $ricore to zero - at least on some of our systems with not too many memory it avoids problems with too much used shared memory, and $ricore would not speed up the calculation for hybrid functionals anyway. See also:

http://www.cosmologic.de/parallel-faq.html

Regards,

Uwe