- Avoid using geometry sets from a different abstraction level
- Deduplicate basic attribute copying propagation code
- Allow more use of implicit sharing when data arrays are unchanged
- Optimize for when a point cloud delete selection is empty
- Handle face corners generically for "only face" case
Pass the curves and points to keep instead of delete. In the same test
file as the previous commit, this gave an increase from 50 to 60 FPS
when deleting curves.
In several nodes, and sculpt brushes, use the proper `IndexMask` type
instead of a span of ranges to encode a selection. This allows deleting
the duplicate data copying utilities using the second format.
Using the new index mask implementation, things can be a bit simpler.
It's also simpler to use `complement` instead of `to_ranges_invert`,
which just made everything less standard. Also create the new curve
offsets in place instead of copying, and use implicit sharing to share
attributes when no curves were deleted.
With a version of `IndexMask::complement()` optmized locally, I observed
the following speedups with a 1.2 million point curve system:
- Delete points: 29 FPS -> 45 FPS
- Delete curves: 48 FPS -> 49 FPS
- Delete tip points: 25 FPS -> 32 FPS
Also add a method to apply the "gather" function to all attributes,
mostly as a continued experiment of consolidating attribute propagation.
This can be used more elsewhere in the future.
Does not happen very often, but that weak handling of copying linked
data as linked data currently can lead to an invalid namemap in Main.
This is a known issue, fixing it requires addressing #107847.
In the mean time, work around it by re-validating and fixing the namemap
after the problematic liboverride calls.
NOTE: only identified issue currently is the proxy conversion of linked
proxies. The other cases *should* be fine.
Found while investigating issues when opening the
`lib/tests/libraries_and_linking/libraries/main_scene.blend` file.
Logic in `main_namemap_validate_and_fix` could end up re-generating
a thousand of time the names of IDs because of an invalid assumption
about processed IDs being re-processable (in case they get renamed).
Also do not `CLOG_ERROR` when checking and fixing errors, if this code is
called to fix errors, it means errors are expected. Use `CLOG_INFO`
instead, or `CLOG_WARN` when the info is really important (like when IDs
had to be renamed).
And finally, simplify code clearing invalid namemaps, there is now a
function to handle this task, `BKE_main_namemap_clear`.
Issues & improvements found while working on readfile errors when
opening `lib/tests/libraries_and_linking/libraries/main_scene.blend`.
Pressing 'A' to select all pose bones in an armature caused a segfault,
as `id->override_library->runtime->tag` was checked while
`id->override_library->runtime` was `nullptr`. An extra check resolved
the crash.
Pressing 'A' to select all pose bones in an armature caused a segfault,
as `id->override_library->runtime->tag` was checked while
`id->override_library->runtime` was `nullptr`. An extra check resolved
the crash.
The runtime backup/restore logic was slightly wrong: it is possible that
an object requires light linking runtime but does not need light linking
itself. This is typical configuration for the receivers/blockers.
Modified the logic so that the evaluated object light linking is allocated
if there was a runtime field needed.
This required to make it so light linking evaluation takes care of feeing
the light_linking if it is empty. The downside of this approach is a
redundant allocation from the object backup when removing light linking
collection from emitter. But this is not a typical evaluation flow, and
the more typical flows are cheap with this approach.
Pull Request: https://projects.blender.org/blender/blender/pulls/108261
Goals of this refactor:
* Reduce memory consumption of `IndexMask`. The old `IndexMask` uses an
`int64_t` for each index which is more than necessary in pretty much all
practical cases currently. Using `int32_t` might still become limiting
in the future in case we use this to index e.g. byte buffers larger than
a few gigabytes. We also don't want to template `IndexMask`, because
that would cause a split in the "ecosystem", or everything would have to
be implemented twice or templated.
* Allow for more multi-threading. The old `IndexMask` contains a single
array. This is generally good but has the problem that it is hard to fill
from multiple-threads when the final size is not known from the beginning.
This is commonly the case when e.g. converting an array of bool to an
index mask. Currently, this kind of code only runs on a single thread.
* Allow for efficient set operations like join, intersect and difference.
It should be possible to multi-thread those operations.
* It should be possible to iterate over an `IndexMask` very efficiently.
The most important part of that is to avoid all memory access when iterating
over continuous ranges. For some core nodes (e.g. math nodes), we generate
optimized code for the cases of irregular index masks and simple index ranges.
To achieve these goals, a few compromises had to made:
* Slicing of the mask (at specific indices) and random element access is
`O(log #indices)` now, but with a low constant factor. It should be possible
to split a mask into n approximately equally sized parts in `O(n)` though,
making the time per split `O(1)`.
* Using range-based for loops does not work well when iterating over a nested
data structure like the new `IndexMask`. Therefor, `foreach_*` functions with
callbacks have to be used. To avoid extra code complexity at the call site,
the `foreach_*` methods support multi-threading out of the box.
The new data structure splits an `IndexMask` into an arbitrary number of ordered
`IndexMaskSegment`. Each segment can contain at most `2^14 = 16384` indices. The
indices within a segment are stored as `int16_t`. Each segment has an additional
`int64_t` offset which allows storing arbitrary `int64_t` indices. This approach
has the main benefits that segments can be processed/constructed individually on
multiple threads without a serial bottleneck. Also it reduces the memory
requirements significantly.
For more details see comments in `BLI_index_mask.hh`.
I did a few tests to verify that the data structure generally improves
performance and does not cause regressions:
* Our field evaluation benchmarks take about as much as before. This is to be
expected because we already made sure that e.g. add node evaluation is
vectorized. The important thing here is to check that changes to the way we
iterate over the indices still allows for auto-vectorization.
* Memory usage by a mask is about 1/4 of what it was before in the average case.
That's mainly caused by the switch from `int64_t` to `int16_t` for indices.
In the worst case, the memory requirements can be larger when there are many
indices that are very far away. However, when they are far away from each other,
that indicates that there aren't many indices in total. In common cases, memory
usage can be way lower than 1/4 of before, because sub-ranges use static memory.
* For some more specific numbers I benchmarked `IndexMask::from_bools` in
`index_mask_from_selection` on 10.000.000 elements at various probabilities for
`true` at every index:
```
Probability Old New
0 4.6 ms 0.8 ms
0.001 5.1 ms 1.3 ms
0.2 8.4 ms 1.8 ms
0.5 15.3 ms 3.0 ms
0.8 20.1 ms 3.0 ms
0.999 25.1 ms 1.7 ms
1 13.5 ms 1.1 ms
```
Pull Request: https://projects.blender.org/blender/blender/pulls/104629
The main issue was the fact that if a Scene is overridden, it's content
will be fully invalidated when updating the liboverride at the end of
the file reading process. Since the FileData keeps a pointer to the
active view layer, it needs to be udated then.
As a side consequence, the liblinking of global data also needs to
happen before liboverrides are updated.
Embedded IDs (master collection of scene, etc.) do not exist in the Main
data-base. However, their tags should follow these from their owners. So
e.g. if a scene is in Main, its master collection should not be tagged
as no-main.
NOTE: this is somewhat also related to our ID tags sanitizing TODO task
(#88555).
Found while invesigating #107913.
Combine the newer less efficient C++ implementations and the older
less convenient C functions. The maps now contain one large array of
indices, split into groups by a separate array of offset indices.
Though performance of creating the maps is relatively unchanged, the
new implementation uses 4 bytes less per source element than the C
maps, and 20 bytes less than the newer C++ functions (which also
had more overhead with larger N-gons). The usage syntax is simpler
than the C functions as well.
The reduced memory usage is helpful for when these maps are cached
in the near future. It will also allow sharing the offsets between
maps for different domains like vertex to corner and vertex to face.
A simple `GroupedSpan` class is introduced to make accessing the
topology maps much simpler. It combines offset indices and a separate
span, splitting it into chunks in an efficient way.
Pull Request: https://projects.blender.org/blender/blender/pulls/107861
This adds `char *simulation_bake_directory` to the nodes modifier. The path is automatically generated the first time the modifier is baked. It is _not_ automatically changed afterwards. The path is relative to the .blend file by default. For now, the path is not exposed in the UI or Python API.
This fixes issues where renaming objects/modifiers can cause the baked data to not work anymore.
Pull Request: https://projects.blender.org/blender/blender/pulls/108201
Before 9f78530d80, the -1 coarse_edge_index values in the
foreach_edge calls would return false in BLI_BITMAP_TEST_BOOL,
which made them look like loose edges. BitSpan doesn't have this
problem, so the return for negative indices must be explicit.
Some `ImagePartialUpdateTest` test are calling code that needs access to
a valid `G_MAIN`. So store the generated main there as part of the setup
step, and reset G_MAIN to its original value (should be NULL) in the
teardown step.
NOTE: Things like `ID_BLEND_PATH_FROM_GLOBAL` and
`BKE_main_blendfile_path_from_global` are pure evil. It may be necessary
in a very few small cases, but their current usages need a lot of strong
cleanup.
Pull Request: https://projects.blender.org/blender/blender/pulls/108189
The usual special ShapeKey case needs yet another extra corner case
special handling... See comments in code for details about that specific
issue.
NOTE: May be worth checking if this can be backported to 3.3 LTS too.
Covers the macro ARRAY_SIZE() and STRNCPY.
The problem this change is aimed to solve it to provide cross-platform
compiler-independent safe way pf ensuring that the functions are used
correctly.
The type safety was only ensured for GCC and only for C. The C++
language and Clang compiler would not have detected issues of passing
bare pointer to neither of those macros.
Now the STRNCPY() will only accept a bounded array as the destination
argument, on any compiler.
The ARRAY_SIZE as well, but there are a bit more complications to it
in terms of transparency of the change.
In one place the ARRAY_SIZE was used on float3 type. This worked in the
old code because the type implements subscript operator, and the type
consists of 3 floats. One would argue this is somewhat hidden/implicit
behavior, which better be avoided. So an in-lined value of 3 is used now
there.
Another place is the ARRAY_SIZE used to define a bounded array of the
size which matches bounded array which is a member of a struct. While
the ARRAY_SIZE provides proper size in this case, the compiler does not
believe that the value is known at compile time and errors out with a
message that construction of variable-size arrays is not supported.
Solved by converting the field to std::array<> and adding dedicated
utility to get size of std::array at compile time. There might be a
better way of achieving the same result, or maybe the approach is
fine and just need to find a better place for such utility.
Surely, more macro from the BLI_string.h can be covered with the C++
inlined functions, but need to start somewhere.
There are also quite some changes to ensure the C linkage is not
enforced by code which includes the headers.
Pull Request: https://projects.blender.org/blender/blender/pulls/108041
Allows to share buffer data between the render result and image buffers.
The storage of the passes and buffers in the render result have been
wrapped into utility structures, with functions to operate on them.
Currently only image buffers which are sharing buffers with the render
results are using the implicit sharing. This allows proper decoupling of
the image buffers from the lifetime of the underlying render result.
Fixes#107248: Compositor ACCESS VIOLATION when updating datablocks from handlers
Additionally, this lowers the memory usage of multi-layer EXR sequences
by avoiding having two copies of render passes in memory.
It is possible to use implicit sharing in more places, but needs
some API to ensure the render result is the only owner of data before
writing to its pixels.
Pull Request: https://projects.blender.org/blender/blender/pulls/108045