One drawback to trying to predict the number of threads that will be
used in the `task_graph` is that we are only sure of the number when the
threads are running.
Using `BLI_task_parallel_range` allows the driver to
choose the best thread distribution through `parallel_reduce`.
The benefit is most evident on hardware with fewer cores.
This is the result on an 4-core laptop:
||before:|after:
|---|---|---|
|large_mesh_editing:|Average: 5.203638 FPS|Average: 5.398925 FPS
||rdata 15ms iter 43ms (frame 193ms)|rdata 14ms iter 36ms (frame 187ms)
Differential Revision: https://developer.blender.org/D11558