Tune the grain size used for the parallel_for to alleviate excessive
mutex contention inside `handle_fan_result_and_custom_normals`.
I happened to notice that the 4004 Moore Lane USD scene[1] experienced a
load time regression compared to the prior release. It looks due to the
grain size used and here are some 3-run averages for the import:
```
Grain | Time in seconds
256 (main) | (14.6+14.6+14.8)/3 = 14.6667
1024 | (13+12.8+12.9)/3 = 12.9
4096 | (13.3+13.1+13.1)/3 = 13.1667
16384 | (12.2+12+ 12.5)/3 = 12.2333
65536 | (9.4+9.2+9.6)/3 = 9.4
131072 | (7.9+7.7+8)/3 = 7.8667
262144 | (7.3+7.1+7.2)/3 = 7.2
max(16384, #verts/2) (PR) | (7.1+6.9+6.8)/3 = 6.9333
```
This PR gets the scenario loading in just under 7 seconds now compared
to over 14 originally.
[1] https://dpel.aswf.io/4004-moore-lane/
Pull Request: https://projects.blender.org/blender/blender/pulls/141249