This improves performance by **reducing** the amounts of threads used for tasks which require a high memory bandwidth. This works because the underlying hardware has a certain maximum memory bandwidth. If that is used up by a few threads already, any additional threads wanting to use a lot of memory will just cause more contention which actually slows things down. By reducing the number of threads that can perform certain tasks, the remaining threads are also not locked up doing work that they can't do efficiently. It's best if there is enough scheduled work so that these tasks can do more compute intensive tasks instead. To use this new functionality, one has to put the parallel code in question into a `threading::memory_bandwidth_bound_task(...)` block. Additionally, one also has to provide a (very) rough approximation for how many bytes are accessed. If the number is low, the number of threads shouldn't be reduced because it's likely that all touched memory can be in L3 cache which generally has a much higher bandwidth than main memory. The exact number of threads that are allowed to do bandwidth bound tasks at the same time is generally highly context and hardware dependent. It's also not really possible to measure reliably because it depends on so many static and dynamic factors. The thread count is now hardcoded to 8. It seems that this many threads are easily capable of maxing out the bandwidth capacity. With this technique I can measure surprisingly good performance improvements: * Generating a 3000x3000 grid: 133ms -> 103ms. * Generating a mesh line with 100'000'000 vertices: 212ms -> 189ms. * Realize mesh instances resulting in ~27'000'000 vertices: 460ms -> 305ms. In all of these cases, only 8 instead of 24 threads are used. The remaining threads are idle in these cases, but they could do other work if available. Pull Request: https://projects.blender.org/blender/blender/pulls/118939
51 lines
1.3 KiB
C++
51 lines
1.3 KiB
C++
/* SPDX-FileCopyrightText: 2023 Blender Authors
|
|
*
|
|
* SPDX-License-Identifier: GPL-2.0-or-later */
|
|
|
|
#include "BLI_bounds.hh"
|
|
|
|
#include "BKE_mesh.hh"
|
|
|
|
#include "GEO_mesh_primitive_line.hh"
|
|
|
|
namespace blender::geometry {
|
|
|
|
Mesh *create_line_mesh(const float3 start, const float3 delta, const int count)
|
|
{
|
|
if (count < 1) {
|
|
return nullptr;
|
|
}
|
|
|
|
Mesh *mesh = BKE_mesh_new_nomain(count, count - 1, 0, 0);
|
|
MutableSpan<float3> positions = mesh->vert_positions_for_write();
|
|
MutableSpan<int2> edges = mesh->edges_for_write();
|
|
|
|
threading::memory_bandwidth_bound_task(positions.size_in_bytes() + edges.size_in_bytes(), [&]() {
|
|
threading::parallel_invoke(
|
|
1024 < count,
|
|
[&]() {
|
|
threading::parallel_for(positions.index_range(), 4096, [&](IndexRange range) {
|
|
for (const int i : range) {
|
|
positions[i] = start + delta * i;
|
|
}
|
|
});
|
|
},
|
|
[&]() {
|
|
threading::parallel_for(edges.index_range(), 4096, [&](IndexRange range) {
|
|
for (const int i : range) {
|
|
edges[i][0] = i;
|
|
edges[i][1] = i + 1;
|
|
}
|
|
});
|
|
});
|
|
});
|
|
|
|
mesh->tag_loose_verts_none();
|
|
mesh->tag_overlapping_none();
|
|
mesh->bounds_set_eager(*bounds::min_max<float3>({start, start + delta * count}));
|
|
|
|
return mesh;
|
|
}
|
|
|
|
} // namespace blender::geometry
|