Author Topic: Efficiency of RICC2_smp (Read 5554 times)

mradon · « **on:** July 26, 2011, 02:27:33 PM »

Dear Turbomole Developers,

I wonder if you have some benchmarks showing how efficient is OMP parallelization in ricc2_smp?

With this module I have performed a few RI-CCSD(T) calculations, each running on 4 threads (theoretically). However, the actual usage of cpu was oscillating between 10% and 390%. The final timings
------------------------------------------------------------------------
total cpu-time : 1 days 6 hours 50 minutes and 33 seconds
total wall-time : 1 days 3 hours 36 minutes and 59 seconds
------------------------------------------------------------------------
show, indeed, a small difference between the cpu-time and wall-time. And, notably, wall-time is shorter (but only slightly). As far as I am aware, the ideal performance of a program running on 4 cores would be if cpu-time were 4 times larger than wall-time. Is that right?

Do you think is there any way to speed-up my calculations by optimizing I/O? I suppose that I/O overhead might be still not optimal in my configuration. But maybe I have already reached the intrinsic limit of ricc2_smp and there is no much reason to optimize I/O?

Your comments will be greatly appreciated. Thank you in advance!

Arnim · « **Reply #1 on:** July 27, 2011, 08:46:51 PM »

Hi,

the difference between cpu-time and wall-time can not be used to measure the parallel speedup. For the OpenMP version, it depends on the platform and on the compiler, what is actually collected in the cpu-time. The wall-time should be ok, though. But for real speedup, you have to run the program on one and then on four threads.

If the CPU usage is too often 10% or lower, it might an indication the setup of your scratch disk is not optimal. E.g. always make sure not to run on NFS partitions.

Hope that helps,

Arnim

mradon · « **Reply #2 on:** July 29, 2011, 02:16:44 PM »

> The wall-time should be ok, though. But for real speedup, you have
> to run the program on one and then on four threads.

We will do these tests (1 core vs 4 cores, etc) of course.
I wonder if you have any results of such parallel benchmarks? It would be very nice to see them on the Turbomole webpage.

> If the CPU usage is too often 10% or lower, it might an indication the setup
> of your scratch disk is not optimal. E.g. always make sure not to run on NFS partitions.

No NFS for sure

It's Lustre connected through Infiniband, so a quite fast configuration. But I suspect that the I/O rate is still not optimal, so we will perhaps give it a try with a local disk or try to further optimize the filesystem.

Thank you for your answer.

TURBOMOLE Users Forum

Author Topic: Efficiency of RICC2_smp (Read 5554 times)

mradon

Efficiency of RICC2_smp

Arnim

Re: Efficiency of RICC2_smp

mradon

Re: Efficiency of RICC2_smp