Optimize mesh normal calculation.
- Remove the intermediate `lnors_weighted` array, accumulate directly
into the normal array using a spin-lock for thread safety.
- Remove single threaded iteration over loops
(normal calculation is now fully multi-threaded).
- Remove stack array (alloca) for pre-calculating edge-directions.
Summary of Performance Characteristics:
- The largest gains are for single high poly meshes, with isolated
normal-calculation benchmarks of meshes over ~1.5 million showing
2x+ speedup, ~25 million polygons are ~2.85x faster.
- Single lower poly meshes (250k polys) can be ~2x slower.
Since these meshes aren't normally a bottleneck,
and this problem isn't noticeable on large scenes,
we considered the performance trade-off reasonable.
- The performance difference reduces with larger scenes,
tests with production files from "Sprite Fight" showing
the same or slightly better overall performance.
NOTE: tested on a AMD Ryzen TR 3970X 32-Core.
For more details & benchmarking scripts, see the patch description.
Reviewed By: mont29
Ref D11993