Dear Turbomole Developers,
I wonder if you have some benchmarks showing how efficient is OMP parallelization in ricc2_smp?
With this module I have performed a few RI-CCSD(T) calculations, each running on 4 threads (theoretically). However, the actual usage of cpu was oscillating between 10% and 390%. The final timings
------------------------------------------------------------------------
total cpu-time : 1 days 6 hours 50 minutes and 33 seconds
total wall-time : 1 days 3 hours 36 minutes and 59 seconds
------------------------------------------------------------------------
show, indeed, a small difference between the cpu-time and wall-time. And, notably, wall-time is shorter (but only slightly). As far as I am aware, the ideal performance of a program running on 4 cores would be if cpu-time were 4 times larger than wall-time. Is that right?
Do you think is there any way to speed-up my calculations by optimizing I/O? I suppose that I/O overhead might be still not optimal in my configuration. But maybe I have already reached the intrinsic limit of ricc2_smp and there is no much reason to optimize I/O?
Your comments will be greatly appreciated. Thank you in advance!