Mesh: Tune the parallelism of normals_calc_corners
Tune the grain size used for the parallel_for to alleviate excessive mutex contention inside `handle_fan_result_and_custom_normals`. I happened to notice that the 4004 Moore Lane USD scene[1] experienced a load time regression compared to the prior release. It looks due to the grain size used and here are some 3-run averages for the import: ``` Grain | Time in seconds 256 (main) | (14.6+14.6+14.8)/3 = 14.6667 1024 | (13+12.8+12.9)/3 = 12.9 4096 | (13.3+13.1+13.1)/3 = 13.1667 16384 | (12.2+12+ 12.5)/3 = 12.2333 65536 | (9.4+9.2+9.6)/3 = 9.4 131072 | (7.9+7.7+8)/3 = 7.8667 262144 | (7.3+7.1+7.2)/3 = 7.2 max(16384, #verts/2) (PR) | (7.1+6.9+6.8)/3 = 6.9333 ``` This PR gets the scenario loading in just under 7 seconds now compared to over 14 originally. [1] https://dpel.aswf.io/4004-moore-lane/ Pull Request: https://projects.blender.org/blender/blender/pulls/141249
This commit is contained in:
committed by
Jesse Yurkovich
parent
d92523d0b4
commit
c8a4026984
@@ -1243,7 +1243,14 @@ void normals_calc_corners(const Span<float3> vert_positions,
|
||||
r_fan_spaces->corners_by_space.reserve(corner_verts.size());
|
||||
}
|
||||
}
|
||||
threading::parallel_for(vert_positions.index_range(), 256, [&](const IndexRange range) {
|
||||
|
||||
int64_t grain_size = 256;
|
||||
/* Decrease parallelism in case where lock is used to avoid contention. */
|
||||
if (!custom_normals.is_empty() || r_fan_spaces) {
|
||||
grain_size = std::max(int64_t(16384), vert_positions.size() / 2);
|
||||
}
|
||||
|
||||
threading::parallel_for(vert_positions.index_range(), grain_size, [&](const IndexRange range) {
|
||||
Vector<VertCornerInfo, 16> corner_infos;
|
||||
LocalEdgeVectorSet local_edge_by_vert;
|
||||
Vector<VertEdgeInfo, 16> edge_infos;
|
||||
|
||||
Reference in New Issue
Block a user