Only concerns poly normals computing, have usual 10% speedup of affected code for OMP -> BLI_task switching.
Also parallelized the 'weighted accum' part (used when computing both polys and vertices normals,
when using modifiers e.g.), which gives nice 325% speedup (from 66ms to 20ms for a 500k poly monkey
with simple deform modifier e.g.). ;)