Author Topic: parallel via PBS freezes (edited)  (Read 6831 times)

jbaltrus

  • Full Member
  • ***
  • Posts: 71
  • Karma: +0/-0
parallel via PBS freezes (edited)
« on: June 05, 2009, 03:05:03 AM »
Well, with the help of Uwe got everything installed and running in sequential mode. Parallel, however, misbehaves.

I run Rocks 5.1 cluster 3 compute nodes with one 4 core processor and submit jobs via PBS. When I submit my optimization job it aborts after the first step converges but there is no clear message why it aborted. All I get is:

 
fine, there is no data group "$actual step"
next step = rdgrad


Needless to say, all the ulimit options are set right on all the nodes.

What is most confusing is that it doesn't abort with error files, processors just stop running rift_mpi jobs and everything sits idle

anything would help

Jonas

 
« Last Edit: June 05, 2009, 11:20:16 PM by jbaltrus »

jbaltrus

  • Full Member
  • ***
  • Posts: 71
  • Karma: +0/-0
Re: parallel via PBS freezes (edited)
« Reply #1 on: June 05, 2009, 11:56:28 PM »
yep, just checked, in sequential run optimization proceeds normally. There is no parallel rdgrad?.. Nope, it's parallel so I don't know what going on

Help anybody?

Jonas
« Last Edit: June 06, 2009, 01:14:46 AM by jbaltrus »