Mesh: Further optimize topology map creation
We need a separate array that we can change in during the parallel group construction. That array tells where in each group the index is added. Building this array is expensive, since construcing a new `Array` fills its elements serially. There are two possible solutions: 1. Use a copy of the offsets to increment result indices directly 2. Rely on OS-optimized `calloc` instead of `malloc` and a copy/fill Both depend on using `fetch_and_add` instead of `add_and_fetch`. The vertex to corner and edge to corner map creation is optimized by this commit, though the benefits will be useful elsewhere in the future. | | Before | 1. offsets copy | 2. calloc | | -------- | ------- | --------------- | --------------- | | Grid 1m | 3.1 ms | 1.9 ms (1.63x) | 1.8 ms (1.72x) | | Grid 16m | 51.8 ms | 33.3 ms (1.55x) | 32.7 ms (1.58x) | This commit implements the calloc solution, since it's slightly faster and simpler. In the future, `Array` could do this optimization itself when it detects that its fill value is just zero bytes. Pull Request: https://projects.blender.org/blender/blender/pulls/112065
This commit is contained in:
@@ -328,12 +328,18 @@ static Array<int> reverse_indices_in_groups(const Span<int> group_indices,
|
||||
}
|
||||
BLI_assert(*std::max_element(group_indices.begin(), group_indices.end()) < offsets.size());
|
||||
BLI_assert(*std::min_element(group_indices.begin(), group_indices.end()) >= 0);
|
||||
Array<int> counts(offsets.size(), -1);
|
||||
|
||||
/* `counts` keeps track of how many elements have been added to each group, and is incremented
|
||||
* atomically by many threads in parallel. `calloc` can be measurably faster than a parallel fill
|
||||
* of zero. Alternatively the offsets could be copied and incremented directly, but the cost of
|
||||
* the copy is slightly higher than the cost of `calloc`. */
|
||||
int *counts = MEM_cnew_array<int>(size_t(offsets.size()), __func__);
|
||||
BLI_SCOPED_DEFER([&]() { MEM_freeN(counts); })
|
||||
Array<int> results(group_indices.size());
|
||||
threading::parallel_for(group_indices.index_range(), 1024, [&](const IndexRange range) {
|
||||
for (const int64_t i : range) {
|
||||
const int group_index = group_indices[i];
|
||||
const int index_in_group = atomic_add_and_fetch_int32(&counts[group_index], 1);
|
||||
const int index_in_group = atomic_fetch_and_add_int32(&counts[group_index], 1);
|
||||
results[offsets[group_index][index_in_group]] = int(i);
|
||||
}
|
||||
});
|
||||
|
||||
Reference in New Issue
Block a user