Author Topic: [SOLVED] Run Numforce calculations with RI utilizing non-RI optimized structures  (Read 16271 times)

Glxblt76

  • Full Member
  • ***
  • Posts: 29
  • Karma: +0/-0
8) NOTE TO READER: This problem (implying several subproblems to troubleshoot/fix/solve), which turned out to be quite complicated for a young theoretical chemist like me, has been solved thanks to kind help of uwe and antti_karttunen!  ;D. The solution of the problem is copied here and is also available in the last post.

A reason which would push you to use RI frequency calculations on non-RI optimized structures is that if you did geometry optimizations without RI in COSMO solvation model and want to validate your geometries from frequency calculations. Vibrational frequencies of COSMO-optimized structures can only be evaluated numerically!  :-[, that is, using the script Numforce. The problem is that Numforce takes ages to calculate vibrational frequencies for large molecules!! Using Numforce with RI can divide this calculation time by ten. For my structures, the values of vibrational frequencies were approximately identical within 1cm-1 with and without RI for the exact same geometry.

So, you might want to use RI to validate your geometries even if you avoided using it during geometry optimization.


To summarize the problem that I wish to address here, simply typing Linux command

nohup NumForce -ri > NumForce.out &

while in the folder of a molecule obtained through a non-RI method will cause you loads of trouble and baffling/cryptic error messages  :o :'( ::). The purpose of this post is to troubleshoot/fix/solve this specific issue.

Problems you are likely to encounter if try to calculate RI frequencies from non-RI structures from Numforce are generally that some required files are not found by Numforce.


Main steps to troubleshoot/fix/solve the problems:
1 modify calculate script (slightly)
2 do single-point calculation with RI from non-RI-optimized structure using the modified calculate script
3 generate gradient file and add it to the folder containing the single-point-ed structure

Detailed walkthrough to calculate RI frequencies from Numforce using structures optimized without RI
(assuming that you already have one folder for each non-RI converged structure)

Note 1: in this example, I assume you used BP86 functional, TZVP basis function, and COSMO solvation model during geometry optimization without RI (a method which may be termed BP-TZVP-COSMO)

Note 2 : Each time, when I say "go into each folder", "go into each subfolder", "for each folder", "for each subfolder" and things like that, a script or an iterative Linux command can be used which looks like this:

for dir in */
do
.......commands.......
done


enabling to treat all molecules in just one entry and to avoid painstakingly repetitive command-line work

- First you need to modify the calculate script as indicated by uwe:
Quote
Open the file $TURBODIR/calculate_2.4_linux64/TURBO.pm and search for the lines

   foreach my $f (@files) {
      (-e "$f") && (unlink($f));
   }
   ($unix==1) && system"gzip \"$mdir\"/* >/dev/null 2>/dev/null";


add a # in front of each of the lines and save the file.

- Create a folder for your RI Numforce Frequency calculations, example:

mkdir freq_ri

- Then, generate a .xyz file for each of your optimized structures.

To do so, go into each folder (optimized structure), and use the following Linux commands:

t2x coord > [name_of_molecule].xyz
mv [name_of_molecule].xyz [folder/of/your/choice/for/ri/freq/calc]

For example, if molecule is water and folder freq_ri is in the folder where all your optimized molecules are stored, go into the folder of water and type:

t2x coord > water.xyz
mv water.xyz ../freq_ri

- Do a single-point calculation with RI

To do so, go to the folder (using, for example cd ../freq_ri) and use these Linux commands:
ls *.xyz > list
calculate -l list -m BP-TZVP-COSMO-SP &> calculate-list.log &


which make single points with RI.

- Generate gradients

To do so, once all calculate jobs are finished, use the command

rdgrad &> rdgrad.out &

for each subfolder of the folder SolutionBP-TZVP, which is contained in the freq_ri folder of our example.

- At this point, you can launch the frequencies calculations using RI.

Case 1.
If you have only one node and/or don't want to run into problems related to SMP/MPI computing (which are "easy" to solve once you have some experience but are painstaking for the newcomer), simply go into each subfolder of freq_ri and enter the command:

nohup NumForce -ri > NumForce.out &

But take care not to overwhelm your computer since Numforce will quickly saturate your RAM!!

Case 2.
If you have a machine with several nodes and want to use the many nodes that your machine contains, script it.
Please note that depending on your machine, and parameters of your account on that machine, permission problems may occur trying to do what I propose here. Searching google (in that respect stackoverflow entries are generally reliable), and asking questions to IT guys may be of help to solve particular problems of each particular machine regarding this concern ...
Note that, notably, issues related to >> Public-Key Based Authentication << may occur. In such case, click on the link, if it is not dead. If it is dead, google "public-key based authentication". This counter-intuitive subject enabled me to fix some permission problems related to SMP computing. It took me painful hours to figure out that "public-key based authentication" was the solution of my permission problem, and I sincerely want to save your precious time ;)


For example, write a file called "mfile" with the identity of your machine written one time for each node it contains. Mine has 8 nodes and the file "mfile" contains this:

theophile-HP-Z420-Workstation
theophile-HP-Z420-Workstation
theophile-HP-Z420-Workstation
theophile-HP-Z420-Workstation
theophile-HP-Z420-Workstation
theophile-HP-Z420-Workstation
theophile-HP-Z420-Workstation
theophile-HP-Z420-Workstation


"theophile-HP-Z420-Workstation" being the name of my machine.

Then, you can use the following script:

#!/bin/bash
for dir in */
do
        echo "$dir"
        cd $dir
        nohup NumForce -ri -mfile ../mfile -scrpath /tmp > NumForce.out &
        wait
        cd ..
done


Copy it into a launch_freq_only_ri.sh file that you save in the freq_ri folder, and then execute the command

chmod 777 launch_freq_only_ri.sh

in order to alleviate permission problems during script execution as much as you can.

Then simply use the command:

./launch_freq_only_ri.sh

And your frequency calculations will be done one after the other using the specified number of nodes in your mfile file.
_________________________________________________________
ORIGINAL POST:


Dear users and support team members,

I don't succeed to run geometry optimizations at BP86-def2-TZVP-RI level with COSMO solvation model.

Before, I wrote a script to, from .xyz files, automatically use define to create input folders for turbomole and run geometry optimizations at BP86-def2-TZVP level with COSMO solvation model. Without RI there was no any problem.

The line I used to call jobex was:

Code: [Select]
jobex -c 100 -mem 2GB > opt.out &
Then, I modified the script to activate RI, in the "define" menu, and I modified the line to call jobex as:

Code: [Select]
jobex -ri -c 100 -mem 2GB > opt.out &
To use "define" automatically, I use a "define.in" (complete command line: define coord < ../define_ri.in) file which contains:
Code: [Select]
a coord
desy
ired
*
b all def2-TZVP
*
eht
y
0
y
dft
on

drv
sec
*
*
*
*

This is the new "define.in" file after adding commands to use RI, that is, "ri" "on" and "[enter]"
Code: [Select]
a coord
desy
ired
*
b all def2-TZVP
*
eht
y
0
y
dft
on

ri
on

drv
sec
*
*
*
*


Now when I try to launch jobex with the line
Code: [Select]
jobex -ri -c 100 -mem 2GB > opt.out &I get in "slave1.output" the following complete message:
Code: [Select]
<<<<<<<<<<<<<<< OUTPUT FROM PROCESS                      0 >>>>>>>>>>>>>>>
 distribution of control by ridft_mpi/rdgrad_mpi
 operating system is UNIX !
 hostname is         theophile-HP-Z420-Workstation

 data group $actual step is not empty
 due to the abend of ridft


 this is only a subsequent error message
 check reason for abend in the other output files ...

 use the command  'actual -r'  to get rid of that

 quit: process                      0  failing ...
 MODTRACE: no modules on stack

  CONTRL dead = actual step
 rdgrad ended abnormally

How to troubleshoot this?

It says "use the command 'actual -r'" but I don't see where exactly I should add this command?

Any help would be greatly appreciated.

Best regards!  ;D
« Last Edit: January 27, 2016, 07:19:57 PM by Glxblt76 »

antti_karttunen

  • Sr. Member
  • ****
  • Posts: 229
  • Karma: +1/-0
Re: ridft: "data group $actual step is not empty"
« Reply #1 on: November 25, 2015, 11:48:46 AM »
Hi,

To troubleshoot this, you need to find the real error message (it is buried somewhere). Please see if job.last or job.1 or possibly job.N (N=number) contain any error messages.

actual -r needs to be executed at the command line, but before this you need to find out the actual problem.

Antti

uwe

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 569
  • Karma: +0/-0
Re: ridft: "data group $actual step is not empty"
« Reply #2 on: November 25, 2015, 12:08:35 PM »
Hi,

thanks Antti!

Glxblt76:

To run jobs in an automated or scripted fashion, especially for RI-DFT jobs with and without COSMO, I'd recommend to use the calculate script. This is part of each Turbomole distribution and, under Linux, in your default Turbomole path.

Batch jobs can also be set up and started from within the graphical user interface TmoleX, by the way.

The calculate script does generate the input for a list of molecules and also directly starts the calculations. All you need is a list of files in various formats (xyz, sdf, pdb, ...). To create your own method, copy and modify one of the predefined definition files.

See $TURBODIR/calculate_2.4_linux64/calculate_manual.pdf for more details.

Regards,

Uwe

Glxblt76

  • Full Member
  • ***
  • Posts: 29
  • Karma: +0/-0
Re: ridft: "data group $actual step is not empty"
« Reply #3 on: November 26, 2015, 08:13:43 AM »
Thanks both of you for your kind answers and the time spent on my problem.

Dear Uwe,

I know the existence of calculate script, and will try to use it for my present problem, but it was easier for me before to write my own procedure, to tune in detail each parameter I wanted to (and also to better understand the ways to do such automation since, IMO, PhD is research for your project, but also learning ;))

EDIT : I started the calculate script for my present structures and I wonder: Is there any way to use all my nodes from the calculate script? I don't see how to do it in user manual. In my previous script, I scripted in such a way that I could use all nodes of my core simultaneously to speed up optimization of my numerous (i. e. hundreds) structures containing 40-140 atoms.

@Antti: This is the content of the master file:
Code: [Select]
  -------------- paraga: parallel mode  -------------
  -------------- paraga: parallel mode  -------------
Platform-MPI licensed for TURBOMOLE.

 rdgrad ended abnormally
SEVERE ERROR from node:   0  CONTRL dead = actual step
SEVERE ERROR from node:   0  CONTRL dead = actual step
 ABORTING
0:0:GA Aborting:: 1
(rank:0 hostname:theophile-HP-Z420-Workstation pid:6531):ARMCI DASSERT fail. armci.c:ARMCI_Error():260 cond:0
Last System Error Message from Task 0:: Inappropriate ioctl for device
MPI Application rank 0 exited before MPI_Finalize() with status 13
Last System Error Message from Task 1:: Inappropriate ioctl for device
forrtl: error (78): process killed (SIGTERM)
forrtl: error (78): process killed (SIGTERM)
Last System Error Message from Task 4:: Inappropriate ioctl for device
forrtl: error (78): process killed (SIGTERM)
Last System Error Message from Task 7:: Inappropriate ioctl for device
forrtl: error (78): process killed (SIGTERM)
Last System Error Message from Task 6:: Inappropriate ioctl for device
forrtl: error (78): process killed (SIGTERM)
Last System Error Message from Task 5:: Inappropriate ioctl for device
forrtl: error (78): process killed (SIGTERM)

I don't know what does this mean exactly, but my machine is SMP and not MPI if I remember correctly, since I have several nodes on one core and not several cores. Might this be the cause?

Best regards
« Last Edit: November 26, 2015, 08:32:06 AM by Glxblt76 »

uwe

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 569
  • Karma: +0/-0
Re: ridft: "data group $actual step is not empty"
« Reply #4 on: November 26, 2015, 04:16:25 PM »
Hello,

calculate works with lists of molecules. If you simply generate a certain number of lists you can start them with calculate at the same time in the same directory. For example by using something like

split --number=r/4 list

will generate xaa, xab, xac, ... files, in which all lines in file list are distributed more or less equally to the number files given in the --number option of split. Then,

unset PARA_ARCH
for i in x??
do
  nohup calculate -l $i -m <your_method>  &> calculate-$i.log &
done


To your second question: The parallel SMP version of ridft and rdgrad by default uses an MPI parallelization plus GlobalArrays to store common data in shared memory.

An error message in the master file which looks like this:

Quote
ARMCI DASSERT fail. armci.c:ARMCI_Error():260 cond:0

usually indicates that you have asked for too much memory. Please reduce $ricore to a smaller value (200 or not more than 1000), remove all your files in /dev/shm/ and restart. Make sure that your user limits especially for stack size and also the maximum amount of shared memory which you are allowed to use is sufficiently large.

If that still does not work (for example if your system administrator did not allow users to utilize sufficient amount of shared memory), try to set

export TM_PAR_FORK=yes

and then rerun the job.

Uwe

Glxblt76

  • Full Member
  • ***
  • Posts: 29
  • Karma: +0/-0
Re: ridft: "data group $actual step is not empty"
« Reply #5 on: November 30, 2015, 04:21:48 PM »
Dear Uwe,

Thanks for advice about use of calculate in parallel. Now I have a question: Is it proper to calculate vibrational frequencies using RI-DFT from geometries that were optimized without RI?

In fact I encounter technical difficulties when I try to do so. It seems the NumForce program called with option "-ri" needs gradients computed from rdgrad. I therefore modified the options of calculate to do single points at BP-def2-TZVP-RI-COSMO level and then tried to calculate gradients using rdgrad but they need a plain "mos" file and the "mos" file issued from the calculations of calculate script is empty. Therefore I used the define script to generate mos guesses and see if it would troubleshoot the problem but it (of course) told me that the orbitals were not orthonormal and stopped with the "SEVERE ERROR" cause.

So, at this point, from my BP-def2-TZVP-COSMO obtained xyz files, I wasn't able to do NumForce calculations using RI. :'(

Do I have to start over all my ~350 geometry optimizations with RI in order to use NumForce with RI? From recent tests it seems that I can divide by 5 to 15 the calculation time for my molecules.

Best regards,
Any help would be warmly appreciated.  ;D
« Last Edit: November 30, 2015, 04:28:53 PM by Glxblt76 »

uwe

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 569
  • Karma: +0/-0
Re: ridft: "data group $actual step is not empty"
« Reply #6 on: December 01, 2015, 12:47:06 PM »
Hi,

did the method you used perform a geometry optimization? If yes, there should be a (gzipped) gradient file in the directory where the job was running. In that case NumForce should recognize it. Otherwise a single-point energy and gradient calculation is indeed needed.

But you are right, it is a bit unfortunate that calculate removes the orbitals by default. There is also no option to prevent calculate from doing so... I'd recommend to modify the calculate script if you want to use it for other purposes than the generation of COSMO files for COSMOtherm.
Open the file $TURBODIR/calculate_2.4_linux64/TURBO.pm and search for the lines

   foreach my $f (@files) {
      (-e "$f") && (unlink($f));
   }
   ($unix==1) && system"gzip \"$mdir\"/* >/dev/null 2>/dev/null";


add a # in front of each of the lines and save the file.

It's probably a good idea to get an own copy of the Turbomole installation and to make the modifications there.

Then calculate will neither remove the mos file nor will it zip the result files.

All this is independent whether you use RI or not. I see no reason why not to use RI.

Regards,

Uwe



Glxblt76

  • Full Member
  • ***
  • Posts: 29
  • Karma: +0/-0
Re: ridft: "data group $actual step is not empty"
« Reply #7 on: January 18, 2016, 10:12:09 AM »
Dear Uwe,

Thanks for this piece of advice. Throughout the last two months I had some hardware problems which prevented me to try to troubleshoot the problem presented above. Now my hardware problems are fixed so I will check if removing the mos-suppressing lines in the calculate script will arrange my problem.

For my present purposes, the most important thing is that I find a structure which is a local minimum of the PES. I don't need excessively time-consuming searches for the true global minimum, I just want to ensure that the structures I intuited from experimental, hydrogen-bonding and sterical arguments are reasonable. So, if the structure is a local minimum (i. e. no Imaginary Frequencies) from an optimization carried out without RI as computed from no-RI Numforce, would it be the same from RI Numforce?

To answer your question, my previous method did a geometry optimization, but the gradient file is a gradient file and not a rdgrad file. I don't know if it is the cause but sometimes it crashes, and sometimes it seems to work.

« Last Edit: January 18, 2016, 11:03:13 AM by Glxblt76 »

Glxblt76

  • Full Member
  • ***
  • Posts: 29
  • Karma: +0/-0
Re: ridft: "data group $actual step is not empty"
« Reply #8 on: January 22, 2016, 11:38:43 AM »
Double post to say it seems the problem is solved.

Problems you are likely to encounter if try to calculate RI frequencies from non-RI structures from Numforce are generally that some required files are not found by Numforce.

Main steps to troubleshoot/fix/solve the problems:
1 modify calculate script (slightly)
2 do single-point calculation with RI from non-RI-optimized structure using the modified calculate script
3 generate gradient file and add it to the folder containing the single-point-ed structure

Detailed walkthrough to calculate RI frequencies from Numforce using structures optimized without RI
(assuming that you already have one folder for each non-RI converged structure)

Note 1: in this example, I assume you used BP86 functional, TZVP basis function, and COSMO solvation model during geometry optimization without RI (a method which may be termed BP-TZVP-COSMO)

Note 2 : Each time, when I say "go into each folder", "go into each subfolder", "for each folder", "for each subfolder" and things like that, a script or an iterative Linux command can be used which looks like this:

for dir in */
do
.......commands.......
done


enabling to treat all molecules in just one entry and to avoid painstakingly repetitive command-line work

- First you need to modify the calculate script as indicated by uwe:
Quote
Open the file $TURBODIR/calculate_2.4_linux64/TURBO.pm and search for the lines

   foreach my $f (@files) {
      (-e "$f") && (unlink($f));
   }
   ($unix==1) && system"gzip \"$mdir\"/* >/dev/null 2>/dev/null";


add a # in front of each of the lines and save the file.

- Create a folder for your RI Numforce Frequency calculations, example:

mkdir freq_ri

- Then, generate a .xyz file for each of your optimized structures.

To do so, go into each folder (optimized structure), and use the following Linux commands:

t2x coord > [name_of_molecule].xyz
mv [name_of_molecule].xyz [folder/of/your/choice/for/ri/freq/calc]

For example, if molecule is water and folder freq_ri is in the folder where all your optimized molecules are stored, go into the folder of water and type:

t2x coord > water.xyz
mv water.xyz ../freq_ri

- Do a single-point calculation with RI

To do so, go to the folder (using, for example cd ../freq_ri) and use these Linux commands:

ls *.xyz > list
calculate -l list -m BP-TZVP-COSMO-SP &> calculate-list.log &


which make single points with RI for each of your molecules (.xyz files) one after the other.

- Generate gradients

To do so, once all calculate jobs are finished, use the command

rdgrad &> rdgrad.out &

for each subfolder of the folder SolutionBP-TZVP, which is contained in the freq_ri folder of our example.

- At this point, you can launch the frequencies calculations using RI.

Case 1.
If you have only one node and/or don't want to run into problems related to SMP/MPI computing (which are "easy" to solve once you have some experience but are painstaking for the newcomer), simply go into each subfolder of freq_ri and enter the command:

nohup NumForce -ri > NumForce.out &

But take care not to overwhelm your computer since Numforce will quickly saturate your RAM!!

Case 2.
If you have a machine with several nodes and want to use the many nodes that your machine contains, script it.
Please note that depending on your machine, and parameters of your account on that machine, permission problems may occur trying to do what I propose here. Searching google (in that respect stackoverflow entries are generally reliable) and asking questions to IT guys may be of help to solve particular problems of each particular machine regarding this concern ...
Note that, notably, issues related to >> Public-Key Based Authentication << may occur. In such case, click on the link, if it is not dead. If it is dead, google "public-key based authentication". This counter-intuitive subject enabled me to fix some permission problems related to SMP computing. It took me painful hours to figure out that "public-key based authentication" was the solution of my permission problem, and I sincerely want to save your precious time ;)


For example, write a file called "mfile" with the identity of your machine written one time for each node it contains. Mine has 8 nodes and the file "mfile" contains this:

theophile-HP-Z420-Workstation
theophile-HP-Z420-Workstation
theophile-HP-Z420-Workstation
theophile-HP-Z420-Workstation
theophile-HP-Z420-Workstation
theophile-HP-Z420-Workstation
theophile-HP-Z420-Workstation
theophile-HP-Z420-Workstation


"theophile-HP-Z420-Workstation" being the name of my machine.

Then, you can use the following script:

#!/bin/bash
for dir in */
do
        echo "$dir"
        cd $dir
        nohup NumForce -ri -mfile ../mfile -scrpath /tmp > NumForce.out &
        wait
        cd ..
done


Copy it into a launch_freq_only_ri.sh file that you save in the freq_ri folder, and then execute the command

chmod 777 launch_freq_only_ri.sh

in order to alleviate permission problems during script execution as much as you can.

Then simply use the command:

./launch_freq_only_ri.sh

And your frequency calculations will be done one after the other using the specified number of nodes in your mfile file.
« Last Edit: January 27, 2016, 07:20:51 PM by Glxblt76 »