Author Topic: Problems with parallel RI-MP2 ...  (Read 7038 times)

Wheely

  • Newbie
  • *
  • Posts: 2
  • Karma: +0/-0
Problems with parallel RI-MP2 ...
« on: February 19, 2008, 04:49:21 PM »
Hello,

I'm just trying to run parallel RIMP2 calculations for quite a long time. Compared to calculations with just one CPU calculations with 8 CPUs need more time (1.5 times). I don't understand, what my problem is. The keywords $tmpdir and $sharedtmpdir are given in my control file and the job runs, but it takes more time! Can anyone help me?

antti_karttunen

  • Sr. Member
  • ****
  • Posts: 216
  • Karma: +1/-0
Re: Problems with parallel RI-MP2 ...
« Reply #1 on: February 19, 2008, 08:10:36 PM »
Hello,

Troubleshooting the performace issues will be easier if you could give some additional information:

1) What kind of system are you trying to calculate (number of atoms and basis functions, possible point group symmetry)
2) How are your computing facilities set up (are you trying to run the job on a single 8-processor machine or 8 separate machines with one CPU on each? Or something inbetween?)
3) Turbomole version, processor architechture, and operating system

Concerning point (1): If the system is small, it might not be useful at all to parallelize the calculation. RI-MP2 calculations on systems with few hundred basis functions are very fast already with one processor...

Concerning point (2): If you have 8 separate machines, how are the nodes connected to each other? Parallel RI-MP2 calculations with ricc2 communicate pretty much data between the nodes, and if the interconnect between the nodes is slow, the communication can become a bottleneck. The manual discusses how to utilize $mpi_param keyword to avoid such a situation (http://www.cosmologic.de/data/DOK_HTML/node196.html).

However, if you have a single 8-processor machine (or two four-processor machines), the disk I/O will consume some "extra" time, because all nodes write their scratch data on the same disk simultaneously (in this case, RAID0 disk configuration is very helpful).

Another important point is the $maxcor data group. All nodes in a parallel calculation will consume the amount of memory specified in $maxcor. A (rough) example: When using a 2-CPU machine with 2 GB memory, one could allocate something like 1500 MB in a serial calculation. But using $maxcor 1500 in a 2-CPU parallel calculation would be a very bad idea, because now the two computing processes would use about 3000 MB of memory. Hence, the job would run out of physical memory and swap data to disk, which would drastically lower the performance.

So, many different things might result in the poor parallel performance you have encountered. However, the problems are certainly worth solving: Due to efficient implementation, the parallel RI-MP2 performance is absolutely fantastic when everything works as it should.