Author Topic: scratch directory in ridft  (Read 9470 times)

Bob79

  • Newbie
  • *
  • Posts: 3
  • Karma: +0/-0
scratch directory in ridft
« on: August 21, 2013, 04:12:05 PM »
Hi folks,

can anyone enlighten me how exactly it is possible to define a local scratch directory for the intermediate files for ridft runs both in serial and parallel? I tried setting $TURBOTMPDIR but it seemed to have no effect...

Thanks,
Bob

Hauke

  • Full Member
  • ***
  • Posts: 37
  • Karma: +0/-0
Re: scratch directory in ridft
« Reply #1 on: August 21, 2013, 06:30:26 PM »
Setting $TURBOTMPDIR should be correct.
You have to make sure that this parameter is also set on the machine the job is actually running and not just on the login node.
What result do you get when you run
echo $TURBOTMPDIR
before you start ridft on the same machine?
What is written in the top lines of the parallel ridft output? In my version 6.3.1 it is something like
Quote
Parallel program ridft_mpi will be taken out of the TURBODIR directory.
TURBOTMPDIR environment variable set to "/scratch/hauke".

Bob79

  • Newbie
  • *
  • Posts: 3
  • Karma: +0/-0
Re: scratch directory in ridft
« Reply #2 on: August 21, 2013, 07:41:19 PM »
It does seem to recognize the environment variable. However, I don't see any difference whether I leave it empty or set it to a directory. It also does not matter if that directory exists or not, I always get:

Code: [Select]
TURBOTMPDIR environment variable set to "/temp/does/not/exist".
This directory must exist and be writable by the master process (slave1).
STARTING ridft ON 2 PROCESSORS!

The problem that we have is the following: we need to run ridft for >4000 molecules and have a sort of master program that runs the bookkeeping and distributes the jobs to cluster nodes. Here, we get the best performance by running ridft in serial multiple times at the the same time. The coord and control files lie on a nfs node, each of the serial ridft puts the diff* diis* files also on the same directory on nfs, which is of course slowing down execution times like crazy. We would like to tell turbomole to put those on a local scratch of the cluster node instead.

When I run ridft in parallel, all I see is a directory "MPI-TEMPDIR-001" in the main directory containing two diis* files, while I don't see where the diff* are put. I don't understand where those are going?

Hauke

  • Full Member
  • ***
  • Posts: 37
  • Karma: +0/-0
Re: scratch directory in ridft
« Reply #3 on: August 21, 2013, 09:08:57 PM »

I agree to your plan running these 4000 ridft calculations in serial mode.

Code: [Select]
TURBOTMPDIR environment variable set to "/temp/does/not/exist".
This directory must exist and be writable by the master process (slave1).

As printed out the directory must exist and has to be writable. Did you check i this is the case?
(for example by
mkdir -p $TURBOTMPDIR
touch ${TURBOTMPDIR}/testfile
)

In the Manual of TM6.5 is written:

Quote
In MPI parallel runs the programs attach to the name given in $TURBOTMPDIR node-speci c extension (e.g. /scratch/username/tmjob-001 ) to avoid clashes between processes that access the same le system. The jobs must have the permissions to create these directories. Therefore one must not set $TURBOTMPDIR to something like /scratch which would result in directory names like /scratch-001 which can usually not created by jobs running under a standard user id

Maybe you should also consider this in your test cases.


By the way, I would recommend to delete the $scfdump keyword out your control file as otherwise at each cycle the scf information is written back to the nfs and usually you don't need this intermediate information (only for restarting...) The (converged) final mos are written back anyway.


Bob79

  • Newbie
  • *
  • Posts: 3
  • Karma: +0/-0
Re: scratch directory in ridft
« Reply #4 on: August 22, 2013, 08:57:02 AM »
Yeah, directory exists and is writable. Thanks for the suggestion about $scfdump.

Hauke

  • Full Member
  • ***
  • Posts: 37
  • Karma: +0/-0
Re: scratch directory in ridft
« Reply #5 on: August 22, 2013, 10:32:13 AM »

I remember that I sometimes had problems with $TURBOTMPDIR being set to a too long path. I don't remember the exact limit but maybe you could do a test with a shorter $TURBOTMPDIR (also avoid spaces and other special characters in it).

uwe

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 560
  • Karma: +0/-0
Re: scratch directory in ridft
« Reply #6 on: August 22, 2013, 02:17:06 PM »
Hi,

setting $TURBOTMPDIR works only if you run the parallel version. If you get MPI-TEMPDIR-* like directories, then indeed the parallel version was started.

You have several options:

  • Modify your master script or your queuing system script such that it first copies the input to a local disk, then starts the job and finally copies all files back to the initial NFS directory. That is what we do in such cases. If your NFS disk is still too slow, you could tar and gzip the files on the local disk after the job has finished and copy back just the one tar.gz file back to NFS, this is usually faster than copying all the individual files.

  • Do what the parallel version of Turbomole does: Instead of setting $TURBOTMPDIR, add the keyword $tmpdir to the control file. I'd recommend to do that within a script:

    kdg end
    echo "\$tmpdir /scratch/user/job-$$" >> control
    echo "\$end" >> control
    mkdir -p /scratch/user/job-$$

    here, just as an example, the process-id of the script is used to assign each scratch directory an individual name (works only if each job is started by an own copy of this script). Or just set up a counter and use it in the directory name.

  • The latest version of the GUI, TmoleX 3.4, contains some batch job options. It is possible to read in a number of molecules, define a job template (one or several subsequent jobs with individual settings like basis set, method, job type, etc.) and to submit that to a remote system using several CPUs. TmoleX will split the list to the number of CPUs and runs the jobs in serial mode at the same time. If you add as working directory a local scratch directory, the jobs will all run on local disks only. It is, however, new and probably not yet suited for utilizing many different nodes within a queuing system...

     
  • A quick-n-dirty way is to let the input on the NFS disk, but generate for each job a symbolic link to a local scratch directory:

    mkdir /tmp/scratch-space-for-me/
    ln -s /tmp/scratch-space-for-me/ ./scratch


    and then add $tmpdir ./scratch to all control files.

Since you are going to write a script on your own, you can of course do whatever you prefer. I just hope that those hints are a bit helpful.

Regards,

Uwe