TURBOMOLE Users Forum
TURBOMOLE Modules => Ricc2 => Topic started by: mradon on July 26, 2011, 02:27:33 PM
-
Dear Turbomole Developers,
I wonder if you have some benchmarks showing how efficient is OMP parallelization in ricc2_smp?
With this module I have performed a few RI-CCSD(T) calculations, each running on 4 threads (theoretically). However, the actual usage of cpu was oscillating between 10% and 390%. The final timings
------------------------------------------------------------------------
total cpu-time : 1 days 6 hours 50 minutes and 33 seconds
total wall-time : 1 days 3 hours 36 minutes and 59 seconds
------------------------------------------------------------------------
show, indeed, a small difference between the cpu-time and wall-time. And, notably, wall-time is shorter (but only slightly). As far as I am aware, the ideal performance of a program running on 4 cores would be if cpu-time were 4 times larger than wall-time. Is that right? ???
Do you think is there any way to speed-up my calculations by optimizing I/O? I suppose that I/O overhead might be still not optimal in my configuration. But maybe I have already reached the intrinsic limit of ricc2_smp and there is no much reason to optimize I/O? ???
Your comments will be greatly appreciated. Thank you in advance!
-
Hi,
the difference between cpu-time and wall-time can not be used to measure the parallel speedup. For the OpenMP version, it depends on the platform and on the compiler, what is actually collected in the cpu-time. The wall-time should be ok, though. But for real speedup, you have to run the program on one and then on four threads.
If the CPU usage is too often 10% or lower, it might an indication the setup of your scratch disk is not optimal. E.g. always make sure not to run on NFS partitions.
Hope that helps,
Arnim
-
> The wall-time should be ok, though. But for real speedup, you have
> to run the program on one and then on four threads.
We will do these tests (1 core vs 4 cores, etc) of course.
I wonder if you have any results of such parallel benchmarks? It would be very nice to see them on the Turbomole webpage.
> If the CPU usage is too often 10% or lower, it might an indication the setup
> of your scratch disk is not optimal. E.g. always make sure not to run on NFS partitions.
No NFS for sure ;) It's Lustre connected through Infiniband, so a quite fast configuration. But I suspect that the I/O rate is still not optimal, so we will perhaps give it a try with a local disk or try to further optimize the filesystem.
Thank you for your answer.