Author Topic: Efficiency of RICC2_smp  (Read 5554 times)

mradon

  • Jr. Member
  • **
  • Posts: 11
  • Karma: +0/-0
    • Website of Mariusz Radon
Efficiency of RICC2_smp
« on: July 26, 2011, 02:27:33 PM »
Dear Turbomole Developers,

I wonder if you have some benchmarks showing how efficient is OMP parallelization in ricc2_smp?

With this module I have performed a few RI-CCSD(T) calculations, each running on 4 threads (theoretically). However, the actual usage of cpu was oscillating between 10% and 390%. The final timings
------------------------------------------------------------------------
         total  cpu-time :   1 days  6 hours 50 minutes and 33 seconds
         total wall-time :   1 days  3 hours 36 minutes and 59 seconds
------------------------------------------------------------------------
show, indeed, a small difference between the cpu-time and wall-time. And, notably, wall-time is shorter (but only slightly). As far as I am aware, the ideal performance of a program running on 4 cores would be if cpu-time were 4 times larger than wall-time. Is that right?   ???

Do you think is there any way to speed-up my calculations by optimizing I/O? I suppose that I/O overhead might be still not optimal in my configuration. But maybe I have already reached the intrinsic limit of ricc2_smp and there is no much reason to optimize I/O? ???

Your comments will be greatly appreciated. Thank you in advance!
Mariusz Radon, Ph.D., D.Sc.
Associate Professor
Faculty of Chemistry, Jagiellonian University, Krakow, Poland
E-mail: mradon@chemia.uj.edu.pl (mariusz.radon@uj.edu.pl)
Web: https://tungsten.ch.uj.edu.pl/~mradon
ORCID: https://orcid.org/0000-0002-1901-8521

Arnim

  • Developers
  • Sr. Member
  • *
  • Posts: 253
  • Karma: +0/-0
Re: Efficiency of RICC2_smp
« Reply #1 on: July 27, 2011, 08:46:51 PM »
Hi,

the difference between cpu-time and wall-time can not be used to measure the parallel speedup. For the OpenMP version, it depends on the platform and on the compiler, what is actually collected in the cpu-time. The wall-time should be ok, though. But for real speedup, you have to run the program on one and then on four threads.

If the CPU usage is too often 10% or lower, it might an indication the setup of your scratch disk is not optimal. E.g. always make sure not to run on NFS partitions.

Hope that helps,

Arnim

mradon

  • Jr. Member
  • **
  • Posts: 11
  • Karma: +0/-0
    • Website of Mariusz Radon
Re: Efficiency of RICC2_smp
« Reply #2 on: July 29, 2011, 02:16:44 PM »
> The wall-time should be ok, though. But for real speedup, you have
>  to run the program on one and then on four threads.

We will do these tests (1 core vs 4 cores, etc) of course.
I wonder if you have any results of such parallel benchmarks? It would be very nice to see them on the Turbomole webpage.

> If the CPU usage is too often 10% or lower, it might an indication the setup
> of your scratch disk is not optimal. E.g. always make sure not to run on NFS partitions.

No NFS for sure ;) It's Lustre connected through Infiniband, so a quite fast configuration. But I  suspect that the I/O rate is still not optimal, so we will perhaps give it a try with a local disk or try to further optimize the filesystem.

Thank you for your answer.
Mariusz Radon, Ph.D., D.Sc.
Associate Professor
Faculty of Chemistry, Jagiellonian University, Krakow, Poland
E-mail: mradon@chemia.uj.edu.pl (mariusz.radon@uj.edu.pl)
Web: https://tungsten.ch.uj.edu.pl/~mradon
ORCID: https://orcid.org/0000-0002-1901-8521