I've done this a few times and would have benefited from a utility
function for it, apparently it's done in a few more places too. The
utilities aren't multithreaded for now, it doesn't seem important
and often multithreading happens at a different level of the call
stack anyway.
Pull Request: https://projects.blender.org/blender/blender/pulls/127517
This adds a variant of `accumulate_counts_to_offsets` which checks for
overflows. The hot loop stays essentially the same, it just uses a `int64_t`
instead of `int` for the counter now. For now the error state is returned by
using an `std::optional`. Alternatives could be to throw `std::overflow_error`
or to use some Result/Expected type in the future.
Obviously, there are more places that should handle this kind of error. It's
also not obvious how to propagate that error further up yet so that we can
display e.g. a warning in the node. That decision should be applicable to other
nodes too. For now, there is no warning on the node.
Pull Request: https://projects.blender.org/blender/blender/pulls/127184
This improves performance by **reducing** the amounts of threads used for tasks
which require a high memory bandwidth.
This works because the underlying hardware has a certain maximum memory
bandwidth. If that is used up by a few threads already, any additional threads
wanting to use a lot of memory will just cause more contention which actually
slows things down. By reducing the number of threads that can perform certain
tasks, the remaining threads are also not locked up doing work that they can't
do efficiently. It's best if there is enough scheduled work so that these tasks
can do more compute intensive tasks instead.
To use this new functionality, one has to put the parallel code in question into
a `threading::memory_bandwidth_bound_task(...)` block. Additionally, one also
has to provide a (very) rough approximation for how many bytes are accessed. If
the number is low, the number of threads shouldn't be reduced because it's
likely that all touched memory can be in L3 cache which generally has a much
higher bandwidth than main memory.
The exact number of threads that are allowed to do bandwidth bound tasks at the
same time is generally highly context and hardware dependent. It's also not
really possible to measure reliably because it depends on so many static and
dynamic factors. The thread count is now hardcoded to 8. It seems that this many
threads are easily capable of maxing out the bandwidth capacity.
With this technique I can measure surprisingly good performance improvements:
* Generating a 3000x3000 grid: 133ms -> 103ms.
* Generating a mesh line with 100'000'000 vertices: 212ms -> 189ms.
* Realize mesh instances resulting in ~27'000'000 vertices: 460ms -> 305ms.
In all of these cases, only 8 instead of 24 threads are used. The remaining
threads are idle in these cases, but they could do other work if available.
Pull Request: https://projects.blender.org/blender/blender/pulls/118939
The logic can be much simpler when curves are selected rather than points,
because then we just copy all of the points in each curve. Like some other
operators, implement both cases.
Listing the "Blender Foundation" as copyright holder implied the Blender
Foundation holds copyright to files which may include work from many
developers.
While keeping copyright on headers makes sense for isolated libraries,
Blender's own code may be refactored or moved between files in a way
that makes the per file copyright holders less meaningful.
Copyright references to the "Blender Foundation" have been replaced with
"Blender Authors", with the exception of `./extern/` since these this
contains libraries which are more isolated, any changed to license
headers there can be handled on a case-by-case basis.
Some directories in `./intern/` have also been excluded:
- `./intern/cycles/` it's own `AUTHORS` file is planned.
- `./intern/opensubdiv/`.
An "AUTHORS" file has been added, using the chromium projects authors
file as a template.
Design task: #110784
Ref !110783.
Add an offset indices utility to do fill constant size new offsets in
parallel, which was already done in the duplicate elements node.
For example, filling poly offsets for a new part of a mesh that is only
quads. In the extrude node this was single-threaded before, so the
new poly offsets is about 10x faster, saving about 10 out of 157 ms
when extruding 2 million faces.
This utility counts the number of occurrences of each index in an array.
This is used for building mesh topology maps offsets, or for counting
the number of connected elements. Some users are geometry nodes,
the subdivision draw cache, and mesh to curve conversion.
See #109628
Replace the implementation of the separate and delete geometry nodes
for meshes. The new code makes more use of the `IndexMask` class, which
was recently optimized. The main goal is to make more of the work scale
with the size of the result mesh rather than the input. For example,
instead of keeping a map from input to output elements, the maps used
to copy attributes go from output to input elements.
The new implementation is generally 2-4x faster, depending on the mode
and the number of elements selected. The new code is also able to skip
more work when nothing is removed.
This also allows using more existing attribute interpolation code,
allowing the overall removal of over 300 lines. Some of the attribute
utilities from a similar change for curves (f63cfd8e28) are
reused directly.
The indices of the result changes, so the test file needs to be updated.
Pull Request: https://projects.blender.org/blender/blender/pulls/108435
A lot of files were missing copyright field in the header and
the Blender Foundation contributed to them in a sense of bug
fixing and general maintenance.
This change makes it explicit that those files are at least
partially copyrighted by the Blender Foundation.
Note that this does not make it so the Blender Foundation is
the only holder of the copyright in those files, and developers
who do not have a signed contract with the foundation still
hold the copyright as well.
Another aspect of this change is using SPDX format for the
header. We already used it for the license specification,
and now we state it for the copyright as well, following the
FAQ:
https://reuse.software/faq/
Combine the newer less efficient C++ implementations and the older
less convenient C functions. The maps now contain one large array of
indices, split into groups by a separate array of offset indices.
Though performance of creating the maps is relatively unchanged, the
new implementation uses 4 bytes less per source element than the C
maps, and 20 bytes less than the newer C++ functions (which also
had more overhead with larger N-gons). The usage syntax is simpler
than the C functions as well.
The reduced memory usage is helpful for when these maps are cached
in the near future. It will also allow sharing the offsets between
maps for different domains like vertex to corner and vertex to face.
A simple `GroupedSpan` class is introduced to make accessing the
topology maps much simpler. It combines offset indices and a separate
span, splitting it into chunks in an efficient way.
Pull Request: https://projects.blender.org/blender/blender/pulls/107861
The "reverse map" of corners to faces and points to curves is the same
for meshes and curves now. Move it to the offset indices header to
reflect this.
This unification can go further in the future, but I'd rather wait
until the design is clearer for now.
Pull Request: https://projects.blender.org/blender/blender/pulls/106570
This adds a new `Interpolate Curves` node. It allows generating new curves
between a set of existing guide curves. This is essential for procedural hair.
Usage:
- One has to provide a set of guide curves and a set of root positions for
the generated curves. New curves are created starting from these root
positions. The N closest guide curves are used for the interpolation.
- An additional up vector can be provided for every guide curve and
root position. This is typically a surface normal or nothing. This allows
generating child curves that are properly oriented based on the
surface orientation.
- Sometimes a point should only be interpolated using a subset of the
guides. This can be achieved using the `Guide Group ID` and
`Point Group ID` inputs. The curve generated at a specific point will
only take the guides with the same id into account. This allows e.g.
for hair parting.
- The `Max Neighbors` input limits how many guide curves are taken
into account for every interpolated curve.
Differential Revision: https://developer.blender.org/D16642
This changes how we access the points that correspond to each curve in a `CurvesGeometry`.
Previously, `CurvesGeometry::points_for_curve(int curve_index) -> IndexRange`
was called for every curve in many loops. Now one has to call
`CurvesGeometry::points_by_curve() -> OffsetIndices` before the
loop and use the returned value inside the loop.
While this is a little bit more verbose in general, it has some benefits:
* Better standardization of how "offset indices" are used. The new data
structure can be used independent of curves.
* Allows for better data oriented design. Generally, we want to retrieve
all the arrays we need for a loop first and then do the processing.
Accessing the old `CurvesGeometry::points_for_curve(...)` did not follow
that design because it hid the underlying offset array.
* Makes it easier to pass the offsets to a function without having to
pass the entire `CurvesGeometry`.
* Can improve performance in theory due to one less memory access
because `this` does not have to be dereferenced every time.
This likely doesn't have a noticable impact in practice.
Differential Revision: https://developer.blender.org/D17025