TURBOMOLE Users Forum
TURBOMOLE Modules => Jobex: Structure Optimization and Molecular Dynamics => Topic started by: golden on January 23, 2012, 05:35:30 PM
-
Hi i ran a calculation using jobex;
nohup jobex -ri -c 200 -energy 8 actual -r > jobex.out &
for the moment it seems to be running but it hasn't written anything new for ages... I tired to stop the calculation using creating a file called stop using "touch stop" but it seems to be not responding at the moment.
Tried to kill using top but it seems to be not working as well... my next option is to restart the computer, but I do not like to go there... Is there any other way to kill the calculation? ???
Thank you very much.
-
Hi,
When you used "top", which Turbomole executable was running at the moment? And what signal did you use to kill it? (9 should do it, if the default 15 does not help). I've never seen a TM executable to hang so badly that kill did not work.
By the way, was the "actual -r" part really included in your command line or is that a typo? (actual -r is a separate command that should be executed in a TM job directory, not used as a jobex switch).
Regards,
Antti
-
When you used "top", which Turbomole executable was running at the moment? And what signal did you use to kill it? (9 should do it, if the default 15 does not help). I've never seen a TM executable to hang so badly that kill did not work.
rdgrad_mpi was running at the time i issued the command in top to kill
yes I think I used 15 (which is the default )
Kill PID 3061 with signal [15]:
As suggested tried using 9 to kill the job as follows;
Kill PID 3061 with signal [15]: 9
it still did not work ???
By the way, was the "actual -r" part really included in your command line or is that a typo? (actual -r is a separate command that should be executed in a TM job directory, not used as a jobex switch).
please correct me on this as I used the actual -r in the command line .. as:
nohup jobex -ri -c 200 -energy 8 actual -r > jobex.out &
If it's not surposed to be in command line how should i issue "actual -r" ? and where ?
I really appreciate the help given .
Thanks
-
Huh, sounds strange. Maybe if you list all TM-related processes and try to "kill" them starting from the bottom? (use kill -9 PID). For example:
[antti@compute-0-4 ~]$ ps -fu antti
UID PID PPID C STIME TTY TIME CMD
antti 10353 10333 0 09:24 ? 00:00:00 -ksh /opt/gridengine/default/spool/compute-0-4/job_scripts/16438
antti 10646 10353 0 09:24 ? 00:00:00 /bin/sh /joy/chem/turbomole/tm63/scripts/jobex -ri
antti 10759 10646 0 09:24 ? 00:00:00 /bin/sh /joy/chem/turbomole/tm63/bin/em64t-unknown-linux-gnu_mpi/ridft -l /joy/chem/turbomole/tm63/bin/em64t-unknown-linux-gnu_mpi
antti 10975 10759 0 09:24 ? 00:00:00 mpirun.mpich -f ./hp_mpi_appfile
antti 10978 10975 0 09:24 ? 00:00:00 /joy/chem/turbomole/tm63/mpirun_scripts/em64t-unknown-linux-gnu_mpi/HPMPI/bin/mpid ...
antti 11092 10978 0 09:24 ? 00:00:00 /joy/chem/turbomole/tm63/bin/em64t-unknown-linux-gnu_mpi/ridft_mpi
antti 11093 10978 96 09:24 ? 00:00:23 /joy/chem/turbomole/tm63/bin/em64t-unknown-linux-gnu_mpi/ridft_mpi
antti 11094 10978 97 09:24 ? 00:00:23 /joy/chem/turbomole/tm63/bin/em64t-unknown-linux-gnu_mpi/ridft_mpi
For the jobex command line, I would use the following:
nohup jobex -ri -c 200 -energy 8 > jobex.out &
(although -energy 8 sounds a bit tight for DFT)
actual -r you can give out in the TM job directory any time. For example, if your jobex job has crashed, you could issue actual -r (after fixing the real problem):
[antti@sandels tm_test]$ actual -r
ridft step seems to have been in serious trouble
[antti@sandels tm_test]$
Hope this helps,
Antti
-
Hi,
Huh, sounds strange. Maybe if you list all TM-related processes and try to "kill" them starting from the bottom? (use kill -9 PID). For example:
Thank you very much it worked really nicely.
:) :) :)