I've done this a few times and would have benefited from a utility
function for it, apparently it's done in a few more places too. The
utilities aren't multithreaded for now, it doesn't seem important
and often multithreading happens at a different level of the call
stack anyway.
Pull Request: https://projects.blender.org/blender/blender/pulls/127517
For `SubdivCCG`, replace the dynamic array-of-struct data with separate
arrays for positions, normals, and masks. This decreases memory
bandwidth requirements for loops that only access one of these arrays.
It also significantly simplifies data access, making it more similar
to mesh data where it can just be accessed with indices.
In a simple test this change speeds up the multires sculpt brush
benchmark by 32%. It also slightly reduced BVH build time by about 12%.
In practice this means completely replacing usage of `CCGElem` for
`SubdivCCG`. The struct is still used in the older subsurf baking code
though. Removing that is a much trickier task that doesn't have short
term benefits.
Part of #118145.
Pull Request: https://projects.blender.org/blender/blender/pulls/127262
Reduce VBO data uploaded to the GPU by a bit less than 2x, at the cost of
creating triangle index buffers. Since the index buffers only need to be
created when topology changes, this means significantly less data needs
to be uploaded to the GPU while sculpting. The hot loops for extracting
mesh data also don't need to access triangles or face visibility, and also
drawing can become more efficient with indices for cached vertex values.
Part of #118145.
Pull Request: https://projects.blender.org/blender/blender/pulls/127351
Enable deferred parallel batch compilation for image renders.
This replaces the use of the `WM_job` system with a regular thread,
since `WM_job` requires access to the main context,
which is not accessible from the render thread.
It also simplifies the system so it creates a single thread at startup
and deletes it at exit.
Pull Request: https://projects.blender.org/blender/blender/pulls/125005
Somehow this untested implementation made it in, I thought I had
tested this. The mistake I made was imagining the VBOs were "indexed"
using the node vertex indices, but we use a flat duplicate-per-triangle
vertex VBO format.
Part of #118145.
In the future we want to move the ownership of the BVH tree to the
original mesh rather than `SculptSession`, in order to persist it
across some more general non-topology changing operations like
node tools. The first step of that change is replacing all places
that used `SculptSession::pbvh` for access with a function.
Add `.faces()`, `.verts()`,`.all_verts()`, and `.grids()` functions to nodes.
Don't change BMesh data access yet. More complex const correctness
makes that situation more difficult.
Pull Request: https://projects.blender.org/blender/blender/pulls/127201
Currently each sculpt BVH node stores the indices of its triangles.
It also stores triangles of vertex indices local to the node, and also
potentially the indices of the face corners in the node.
The problem with this is that the leaf nodes store plenty of redundant
information. The triangles in each face aren't split up between multiple
nodes, so storing triangle instead of face indices is unnecesssary. For
the local vertex triangles, there is also duplicate information-- twice
the number of indices as necessary for quad meshes. We also often need
a node's faces, which is currently done with a triangle to face map
using 4 bytes per triangle.
This "double storage" results in extra processing too. For example,
during BVH builds we need to combine twice the number of "Bounds"
objects for a quad mesh. And we have to recalculate a node's face
indices every time we want them.
This commit replaces the PBVH triangle indices with face indices, and
replaces the local vertex triangles array by using a `VectorSet` to store
each node's vertex indices. This results in significant performance and
memory usage improvements.
| | Before | After | Improvement |
| ----------------- | -------- | -------- | ----------- |
| BVH build time | 1.29 s | 0.552 s | 2.3x |
| Brush stroke time | 3.57 s | 2.52 s | 1.4x |
| Memory usage | 4.14 GiB | 3.66 GiB | 1.3x |
All testing is done with a 16 million vertex grid and a Ryzen 7950x.
My guess is that the brush stroke time is improved by the added sorting
of node vertex indices, and by the overall increase in memory bandwidth
availability for mesh data. Personally I'm pleasantly surprised by the
whole improvement, since I usually try to avoid hash table data
structures for this sort of use case. But a lookup into this set ends up
just being a boolean and with an array lookup, so it's quite cheap.
Pull Request: https://projects.blender.org/blender/blender/pulls/127162
This commit rewrites the PBVH drawing using many of the principles from the
ongoing sculpt refactor. First of all, per BVH node overhead is minimized.
Previously the main entry point to the drawing API was per node, so there
was significant overhead fetching global data and maintaining caches on
a per-node basis. Now all of that "global" work happens for the entire
geometry.
We also now avoid creating wireframe index buffers and batches unless
the viewport actually requests wireframe data. This was theoretically
possible before, but the whole logic flow was so convoluted that the
optimization was too difficult. Similarly, multithreading is used more
consistently now. Because of OpenGL, flushing vertex/index buffers to
the GPU has to happen on the main thread, but everything else can be
multithreaded. With outer loops processing all relevant PBVH nodes,
it's now trivial to apply multithreading wherever possible.
Testing performance, overall this commit results in a 10% improvement in
the time between opening a file with a large mesh sculpt and the first
possible interaction. Specifically I measured a change from 8.4 to 7.6
seconds on a completely visible 16 million vertex mesh with a Ryzen 7950x.
I also measured a decrease in memory usage from 4.79 to 4.31 GB.
For multires I observed a similar improvement in memory usage,
though less of a performance improvement.
There are still significant opportunities for future improvement. #122775
would be particularly helpful. #99983 would be helpful too, though more
complicated, and #97665 describes the problems a bit more generally.
Part of #118145.
Pull Request: https://projects.blender.org/blender/blender/pulls/127002
Functional Changes:
- Custom shapes using empties now supports line width.
- Line width is supported on MacOS.
- Fixed Stick bone drawing on MacOS.
Some shaders are duplicated and ported to the new
primitive expansion API.
The legacy code inside `overlay_armature.cc` have been
guarded behind `NO_LEGACY_OVERLAY` which can
be enabled to make sure no legacy code is used unnoticed.
This allows for spotting more easily code that needs to be
ported. Moreover, it is easier to remove this legacy code
when the time comes.
Rel #102179
Pull Request: https://projects.blender.org/blender/blender/pulls/126474
The change was accidentally done in #121383 which primarily concerned
itself with overlay text colors, but started to use TH_BACK theme color
for the draw manager text (e.g. for geometry nodes value visualization)
outline color. Change behavior to use black or white outline color, based
on lightness of text color.
Pull Request: https://projects.blender.org/blender/blender/pulls/127071
The term "tool" is historic from before the actual tool system got
introduced. Since then the term was already a bit confusing, because it
wasn't directly related to the tool system, but there was still some
relationship between the two. Now brushes and their types are decoupled
much more from the tool system, with a single "Brush" tool supporting
all kinds of brushes (draw, grab, cloth, smooth, ...).
For a more clear terminology, use "brush type" instead of "tool".
For #126032 we need to write the brush type to the asset metadata (done
in !124618), so we can filter brushes based on the type (so the grease
pencil eraser tool only shows eraser brushes, for example). I'd like to
use future proof names for that to avoid versioning of asset metadata in
future, so I'd rather do the full naming change now.
RNA properties (thus BPY names) are not changed for compatibility
reasons. Can be done in 5.0, see blender/blender#124201.
Pull Request: https://projects.blender.org/blender/blender/pulls/126796
Add back clipping using the same GL clip planes as before.
The difference is that the clip planes are now stored in the
`GlobalsUboStorage` instead of relying on another separate
UBO.
One annoyance of the current design is that the `overlay::Instance`
has to be created with the clipping state. This could be fixed later
by making the shader module a pointer instead of a reference.
Rel #102179
Pull Request: https://projects.blender.org/blender/blender/pulls/127018
Reorganization to make the retrieval of data from the arguments struct
more explicit, combined with a bit of renaming. Mostly to make a future
diff visually simpler.
This changes how the lazy-loading and unloading of volume grids works. With that
it should also fix#124164.
The cache is now moved to a deeper and more global level. This allows reloadable
volume grids to be unloaded automatically when a memory limit is reached. The
previous system for automatically unloading grids only worked in fairly specific
cases and also did not work all that well with caching (parts of) volume
sequences.
At its core, this patch adds a general cache system in `BLI_memory_cache.hh`. It
has a simple interface of the form `get(key, compute_if_not_cached_fn) ->
value`. To avoid growing the cache indefinitly, it uses the new
`BLI_memory_counter.hh` API to detect when the cache size limit is reached. In
this case it can automatically free some cached values. Currently, this uses an
LRU system, where the items that have not been used in a while are removed
first. Other heuristics can be implemented too, but especially for caches for
loading files from disk this works well already.
The new memory cache is internally used by `volume_grid_file_cache.cc` for
loading individual volume grids and their simplified variants. It could
potentially also be used to cache which grids are stored in a file.
Additionally, it can potentially also be used as caching layer in more places
like loading bakes or in import geometry nodes. It's not clear yet whether this
will need an extension to the API which currently is fairly minimal.
To allow different systems to use the same memory cache, it has to support
arbitrary identifiers for the cached data. Therefore, this patch also introduces
`GenericKey`, which is an abstract base class for any kind of key that is
comparable, hashable and copyable.
The implementation of the cache currently relies on a new `ConcurrentMap`
data-structure which is a thin wrapper around `tbb::concurrent_hash_map` with a
fallback implementation for when `tbb` is not available. This data structure
allows concurrent reads and writes to the cache. Note that adding data to the
cache is still serialized because of the memory counting.
The size of the cache depends on the `memory_cache_limit` property that's
already shown in the user preferences. While it has a generic name, it's
currently only used by the VSE which is currently using the `MEM_CacheLimiter`
API which has a similar purpose but seems to be less automatic, thread-safe and
also has no idea of implicit-sharing. It also seems to be designed in a way
where one is expected to create multiple "cache limiters" each of which has its
own limit. Longer term, we should probably strive towards unifying these
systems, which seems feasible but a bit out of scope right now. While it's not
ideal that these cache systems don't use a shared memory limit, it's essentially
what we already have for all cache systems in Blender, so it's nothing new.
Some tests for lazy-loading had to be removed because this behavior is more
implicit now and is not as easily observable from the outside.
Pull Request: https://projects.blender.org/blender/blender/pulls/126411
This increases playback performance from 2.9fps to 3.2fps in the test file from #126391.
The check is unnecessary in draw code, because we know that the depsgraph
finished evaluation before. These checks were introduced to handle dependency
cycles during depsgraph evaluation.
At some point it may be nice to look into making these checks cheaper to avoid having
to use the unchecked version for performance reasons.
Part of #118145.
Similar in concept to recent commits removing the usage of
this mesh pointer in favor of fetching the data as necessary.
Also see recent discussion in a recent fix for this area:
https://projects.blender.org/blender/blender/pulls/122850.
And also note the comment for `Tree::mesh_` was incorrect.
The mesh was the original mesh, not the evaluated mesh.
Part of #118145.
There is some complexity in this area because the normals need to be
updated on the original geometry only when there is no deformation
or multires modifier. The simplest way to encapsulate that usage of
the original geometry for now was adding a separate function that
contains the lookup with a comment justifying it.
d1049f6082 translated the logic from the `int8` to `float`
conversion incorrectly. Previously the data was extracted to an integer
array (multiplied by 254) first then translated to a float VBO (and
divided by 255). I think conceptually the special `1.0/255` value isn't
necessary because the "optimal display edges" area already taken into
account for the creation of the index buffer. But as a quick fix, just
fix the factor to be the same as before.
Pull Request: https://projects.blender.org/blender/blender/pulls/126290
Because the previous fix stopped creating these VBOs when they
would be empty we need more null checks. Alternatively still
creating them but not binding them might be a better solution
but just adding null checks seems like the simpler approach
right now.
Pull Request: https://projects.blender.org/blender/blender/pulls/126073
Assert in debug build, allocated vertex buffer has garbage data when
total points in the drawing are zero, Nothing is copied to vertex buffers
from `copy_transformed_positions()`. To fix the crash, exit early when
the number of points is zero and execute `DRW_shgroup_call_no_cull`
macro when batch is non-null.
Pull Request: https://projects.blender.org/blender/blender/pulls/126134
There were two issues. One was that the normals VBO wasn't created
with the correct size. The other was that there were empty VBOs created
which can apparently also cause crashes.
Pull Request: https://projects.blender.org/blender/blender/pulls/126041