Dear Guillaume,
The Minnesota functionals are said to be grid sensitive, so it is best to use at least a medium-sized grid such as "gridsize 3" or even a large grid such as "gridsize 4". Additionally, you can turn on the XC weight derivatives, which is often helpful to avoid the behavior you observed. Then, the DFT group reads as follows.
$dft
functional m06-2x
gridsize 3
weight derivatives on
Usually, it should read "gridsize" instead of "grid" in the data group $dft.This avoids M-grids (c.f. output "iterations will be done with small grid"). There is no reason to use the M-grids with hybrid functionals (your computation time is dominated by exact exchange and using smaller grids in the SCF iterations will only come with a loss of accuracy). Additionally, the M-grids can lead to issues for some functionals and response properties. With gridsize 3, it starts to get better. The norm of the gradient is already small in the first cycle and a run jobex for five cycles.
cycle = 1 SCF energy = -2665.2124386120 |dE/dxyz| = 0.004486
cycle = 2 SCF energy = -2665.2123311550 |dE/dxyz| = 0.004276
cycle = 3 SCF energy = -2665.2123607860 |dE/dxyz| = 0.003078
cycle = 4 SCF energy = -2665.2124498240 |dE/dxyz| = 0.002643
cycle = 5 SCF energy = -2665.2125095600 |dE/dxyz| = 0.001462
I assume that the geometry convergence behavior will get even better with gridsize 4. These are the results from the first few cycles. After a "wrong" initial move, which is properly detected (gradient and energy both get worse), the geometry starts to move back in terms of relaxation.
cycle = 1 SCF energy = -2665.2131558740 |dE/dxyz| = 0.001065
cycle = 2 SCF energy = -2665.2128582950 |dE/dxyz| = 0.006480
cycle = 3 SCF energy = -2665.2129874270 |dE/dxyz| = 0.004899
cycle = 4 SCF energy = -2665.2130925560 |dE/dxyz| = 0.003729
cycle = 5 SCF energy = -2665.2131519440 |dE/dxyz| = 0.002038
If neither larger gridsize nor weight derivatives help, you can turn off pruning with "fullshell on" or you can increase the number of radial points using radsize (see manual). Alternatively, you can use the a-grids (gridsize 4a etc.), these set the number of radial points based on the atom number instead of the row of the periodic table as done with the usual grids.
Hope this helps.
Best wishes,
Yannick