Hello,
RI-K is not parallelized at all in rdgrad. The method was developed as a way to speed up Hartree-Fock when doing MP2 calculations. RI-K works best with medium-sized molecules and larger basis sets (at least TZVP) and if you have enough memory to store the RI matrices completely in RAM.
For DFT jobs the basis sets in use are usually smaller, so here RI-K can be even slower than the conventional DFT calculation.
B3-LYP works really good with RI-J only (i.e. without RI-K) also in parallel. Here a (so called) linear scaling exchange algorithm is being used which makes the calculation faster than the non-RI-J case by a factor of two and more (depends on the size of the molecule, but if you run it in parallel, I assume that it is not a small job).
Regards,
Uwe