Author Topic: spin-orbit problem  (Read 8200 times)

sazabi

  • Newbie
  • *
  • Posts: 5
  • Karma: +0/-0
spin-orbit problem
« on: May 17, 2010, 06:50:48 AM »
Hello,

Sorry to disturb again. One of our group member is having this problem during a spin-orbit calculation. Here is the information

spin-orbit  in parallel:

I am failing to run example "ridft_short/BiH5.KRAMERS.GHF" in parallel.
I tried running it with 2 and 8 cores but it fails after one iteration
with the error message shown below. The example runs fine without the
"soghf" option.
For reference the error message is:

ridft_mpi: Rank 0:2: MPI_Recv: Message truncated
MPI Application rank 2 exited before MPI_Finalize() with status 14
forrtl: error (78): process killed (SIGTERM)
Image              PC                Routine            Line        Source
libc.so.6          0000003653E99730  Unknown               Unknown  Unknown
libc.so.6          0000003653ECCC74  Unknown               Unknown  Unknown
ridft_mpi          00000000009847F0  pa_rcv_nap_               220  pa_mp.f
ridft_mpi          00000000009956B8  ridftserver_              211
ridftserver.f
ridft_mpi          0000000000987FDD  serversub_                286
serversub.F
ridft_mpi          000000000098324D  pa_ninit_                  83
pa_slave.f
ridft_mpi          000000000091D9FB  conny_                    135  conny.f
ridft_mpi          000000000091A924  cntrlp_                   248  cntrlp.f
ridft_mpi          0000000000496E42  prelim_                   125  prelim.f
ridft_mpi          000000000040E8B6  MAIN__.P                  516  ridft.f
ridft_mpi          000000000040C2BC  Unknown               Unknown  Unknown
libc.so.6          0000003653E1D974  Unknown               Unknown  Unknown
ridft_mpi          000000000040C1EA  Unknown               Unknown  Unknown
forrtl: error (78): process killed (SIGTERM)
Image              PC                Routine            Line        Source
libc.so.6          0000003653EB9D47  Unknown               Unknown  Unknown
libmtmpi.so.1      00002AADF975C041  Unknown               Unknown  Unknown
libmtmpi.so.1      00002AADF975C096  Unknown               Unknown  Unknown
libmtmpi.so.1      00002AADF96E1A0F  Unknown               Unknown  Unknown
libmtmpi.so.1      00002AADF96E1795  Unknown               Unknown  Unknown
libmtmpi.so.1      00002AADF971BFEC  Unknown               Unknown  Unknown
ridft_mpi          00000000009842DA  pa_rcv_                   147  pa_mp.f
ridft_mpi          00000000004C0AB6  parlp1_                   392  parlp1.f
ridft_mpi          00000000004BC0AA  colaux_                   155  colaux.f
ridft_mpi          00000000004EA5F5  griscf_.M                 432  griscf.f
ridft_mpi          0000000000426A61  MAIN__.P                 1605  ridft.f
ridft_mpi          000000000040C2BC  Unknown               Unknown  Unknown
libc.so.6          0000003653E1D974  Unknown               Unknown  Unknown
ridft_mpi          000000000040C1EA  Unknown               Unknown  Unknown


Thanks.


Can you give some advices to us? Danke!

Chris Bright

  • Newbie
  • *
  • Posts: 3
  • Karma: +0/-0
Re: spin-orbit problem
« Reply #1 on: January 29, 2013, 08:52:52 PM »
I have the same problem!!

ridft_mpi: Rank 0:5: MPI_Recv: Message truncated
MPI Application rank 5 exited before MPI_Finalize() with status 14
forrtl: error (78): process killed (SIGTERM)

etcetera etcetera...


PLEASE HELP!

(nb. If i run the code in serial mode, the simulation works fine. With MPI, it stops after the 1st iteration)

Chris Bright

  • Newbie
  • *
  • Posts: 3
  • Karma: +0/-0
Re: spin-orbit problem
« Reply #2 on: August 12, 2013, 06:31:11 PM »
No help?

antti_karttunen

  • Sr. Member
  • ****
  • Posts: 227
  • Karma: +1/-0
Re: spin-orbit problem
« Reply #3 on: August 12, 2013, 06:47:21 PM »
Hi,

I think that at least in Turbomole 6.4 the spin-orbit calculations have not been MPI-parallelized. I haven't yet used 6.5, so I don't know whether the new SMP-parallelization schemes work for $soghf (I do hope they do!).

Best,
Antti

Hauke

  • Full Member
  • ***
  • Posts: 37
  • Karma: +0/-0
Re: spin-orbit problem
« Reply #4 on: August 28, 2013, 04:33:26 PM »
At least with the 6.5 test version I still can't get an example input containing $soghf to run in parallel. When using SMP I still get
Quote
SEVERE ERROR from node:   0 par_set: name not found
and with MPI
Quote
...
ridft_mpi: Rank 0:3: MPI_Bcast: Message truncated
MPI Application rank 2 exited before MPI_Finalize() with status 14
forrtl: error (78): process killed (SIGTERM)
Image              PC                Routine            Line        Source
libc.so.6          00002B78EAC1EC40  Unknown               Unknown  Unknown
...

In serial mode it works fine. Or does anybody know a trick to run it in parallel? Maybe one can fix this or at least add a more meaningful error message.