This adds feature parity with Cycles regarding light and shadow liking.
Technically, this extends the GBuffer header to 32 bits, and uses
the top bits to store the object's light set membership index.
The same index is also added to `ObjectInfo` in place of padding bytes.
For shadow linking, the shadow blocker sets bitmask is stored per
tilemap. It is then used during the GPU culling phase to cull objects
that do not belong to the shadow's sets.
Co-authored-by: Clément Foucault <foucault.clem@gmail.com>
Pull Request: https://projects.blender.org/blender/blender/pulls/127514
Remove the indirection previously used for the topology refiner
to separate C and C++ code. Instead retrieve the base level in
calling code and call opensubdiv API functions directly. This
avoids copying arrays of mesh indices and should reduce
function call overhead since index retrieval can now be inlined.
It also lets us remove a lot of boilerplate shim code.
The downside is increased need for WITH_OPENSUBDIV defines
in various parts of blenkernel, but I think that is required to avoid
the previous indirection and have the kernel deal with OpenSubdiv
more directly.
Pull Request: https://projects.blender.org/blender/blender/pulls/120825
Memory allocated to vertex buffer but the total verts are zero. This
triggers assert in bind function when `vbo_size_` is zero. Exit out of
funtion early to prevent the assert.
Also wrap `DRW_shgroup_call_no_cull` with if check.
Pull Request: https://projects.blender.org/blender/blender/pulls/128248
The first commit ensures IsectBoxes are not set up unless
they are valid.
The second commit renames
`drw_bounds_are_valid` to `drw_bounds_corners_are_valid`,
and `drw_bounds_culling_enabled` to `drw_bounds_are_valid`
so it's harder to set up an invalid `IsectBox` by mistake.
(Continuation of #127807)
Pull Request: https://projects.blender.org/blender/blender/pulls/128125
Adds antialiasing to curve's handles and thickness to active ones.
Also handles now react to
`Preferences > Interface > Display > Resolution Scale` and
`Preferences > Themes > 3D Viewport > Edge Width` as they do in
legacy curves.
Pull Request: https://projects.blender.org/blender/blender/pulls/122910
Currently the `sum_group_sizes` call used to count the number
of elements in GPU vertex buffers for each PBVH node stands out
in profiles, taking a few percent of the total time when building
PBVH GPU data. Since the number of corners in a node doesn't
change, it's simpler to just store it in the node. We could
eventually cache it somewhere else too, if there was a benefit
to not storing it in the node itself.
Object with degenerate transform matrix can lead to flat
bounds on GPU. This in turn lead to NaN intersection planes
inside `IsectBox`.
Compute (pseudo) size of matrix and bypass culling is any
axis is too small.
The other part of the patch makes sure that there is a
distinction between disabled culling and invalid
bounding boxes.
Pull Request: https://projects.blender.org/blender/blender/pulls/127807
These buffers are bound as texture buffers and will have
their backend object created whether or not the subsequent
drawcall is emited.
Ensuring a minimum size fixes the issue.
Updating attributes is now done specifically for each attribute,
and topology and visibility updates tag the draw cache data
explicitly too. That makes checking for updates with these
tags unnecessary.
Similar to the previous commits, previously all GPU buffers
were recreated when only masks changed. Now only
that masks are re-uploaded, using the same mechanism.
Part of #118145.
Similar to the previous commit, previously all GPU buffers
were recreated when only a color attribute changed. Now only
that attribute is re-uploaded, using the same mechanism.
Part of #118145.
Similar to the previous commit, previously all GPU buffers
were recreated when only face sets changed. Now only face
sets are re-uploaded, using the same mechanism.
Part of #118145.
Previously tagging node positions dirty set the `PBVH_UpdateDrawBuffers`
flag which was then used by `node_draw_update_mask` and
`tag_all_attributes_dirty`, which, as hinted by the name, causes a refresh
of all the PBVH GPU vertex buffers, not just positions and normals.
Instead of using that flag, add a function to the draw cache to tag only
position-related data dirty. This should give a performance improvement
when there are also face sets, masks, and/or generic attributes being
extracted for drawing.
Part of $118145.
Add a new function to the draw manager to only
issue a unique resource ID per `ObjectRef`.
This avoids the memory overhead of duplicating
the handle for each overlay.
The handle is stored inside the `ObjectRef` on
first query.
Rel #102179
I've done this a few times and would have benefited from a utility
function for it, apparently it's done in a few more places too. The
utilities aren't multithreaded for now, it doesn't seem important
and often multithreading happens at a different level of the call
stack anyway.
Pull Request: https://projects.blender.org/blender/blender/pulls/127517
For `SubdivCCG`, replace the dynamic array-of-struct data with separate
arrays for positions, normals, and masks. This decreases memory
bandwidth requirements for loops that only access one of these arrays.
It also significantly simplifies data access, making it more similar
to mesh data where it can just be accessed with indices.
In a simple test this change speeds up the multires sculpt brush
benchmark by 32%. It also slightly reduced BVH build time by about 12%.
In practice this means completely replacing usage of `CCGElem` for
`SubdivCCG`. The struct is still used in the older subsurf baking code
though. Removing that is a much trickier task that doesn't have short
term benefits.
Part of #118145.
Pull Request: https://projects.blender.org/blender/blender/pulls/127262
Reduce VBO data uploaded to the GPU by a bit less than 2x, at the cost of
creating triangle index buffers. Since the index buffers only need to be
created when topology changes, this means significantly less data needs
to be uploaded to the GPU while sculpting. The hot loops for extracting
mesh data also don't need to access triangles or face visibility, and also
drawing can become more efficient with indices for cached vertex values.
Part of #118145.
Pull Request: https://projects.blender.org/blender/blender/pulls/127351
Enable deferred parallel batch compilation for image renders.
This replaces the use of the `WM_job` system with a regular thread,
since `WM_job` requires access to the main context,
which is not accessible from the render thread.
It also simplifies the system so it creates a single thread at startup
and deletes it at exit.
Pull Request: https://projects.blender.org/blender/blender/pulls/125005
Somehow this untested implementation made it in, I thought I had
tested this. The mistake I made was imagining the VBOs were "indexed"
using the node vertex indices, but we use a flat duplicate-per-triangle
vertex VBO format.
Part of #118145.
In the future we want to move the ownership of the BVH tree to the
original mesh rather than `SculptSession`, in order to persist it
across some more general non-topology changing operations like
node tools. The first step of that change is replacing all places
that used `SculptSession::pbvh` for access with a function.
Add `.faces()`, `.verts()`,`.all_verts()`, and `.grids()` functions to nodes.
Don't change BMesh data access yet. More complex const correctness
makes that situation more difficult.
Pull Request: https://projects.blender.org/blender/blender/pulls/127201
Currently each sculpt BVH node stores the indices of its triangles.
It also stores triangles of vertex indices local to the node, and also
potentially the indices of the face corners in the node.
The problem with this is that the leaf nodes store plenty of redundant
information. The triangles in each face aren't split up between multiple
nodes, so storing triangle instead of face indices is unnecesssary. For
the local vertex triangles, there is also duplicate information-- twice
the number of indices as necessary for quad meshes. We also often need
a node's faces, which is currently done with a triangle to face map
using 4 bytes per triangle.
This "double storage" results in extra processing too. For example,
during BVH builds we need to combine twice the number of "Bounds"
objects for a quad mesh. And we have to recalculate a node's face
indices every time we want them.
This commit replaces the PBVH triangle indices with face indices, and
replaces the local vertex triangles array by using a `VectorSet` to store
each node's vertex indices. This results in significant performance and
memory usage improvements.
| | Before | After | Improvement |
| ----------------- | -------- | -------- | ----------- |
| BVH build time | 1.29 s | 0.552 s | 2.3x |
| Brush stroke time | 3.57 s | 2.52 s | 1.4x |
| Memory usage | 4.14 GiB | 3.66 GiB | 1.3x |
All testing is done with a 16 million vertex grid and a Ryzen 7950x.
My guess is that the brush stroke time is improved by the added sorting
of node vertex indices, and by the overall increase in memory bandwidth
availability for mesh data. Personally I'm pleasantly surprised by the
whole improvement, since I usually try to avoid hash table data
structures for this sort of use case. But a lookup into this set ends up
just being a boolean and with an array lookup, so it's quite cheap.
Pull Request: https://projects.blender.org/blender/blender/pulls/127162
This commit rewrites the PBVH drawing using many of the principles from the
ongoing sculpt refactor. First of all, per BVH node overhead is minimized.
Previously the main entry point to the drawing API was per node, so there
was significant overhead fetching global data and maintaining caches on
a per-node basis. Now all of that "global" work happens for the entire
geometry.
We also now avoid creating wireframe index buffers and batches unless
the viewport actually requests wireframe data. This was theoretically
possible before, but the whole logic flow was so convoluted that the
optimization was too difficult. Similarly, multithreading is used more
consistently now. Because of OpenGL, flushing vertex/index buffers to
the GPU has to happen on the main thread, but everything else can be
multithreaded. With outer loops processing all relevant PBVH nodes,
it's now trivial to apply multithreading wherever possible.
Testing performance, overall this commit results in a 10% improvement in
the time between opening a file with a large mesh sculpt and the first
possible interaction. Specifically I measured a change from 8.4 to 7.6
seconds on a completely visible 16 million vertex mesh with a Ryzen 7950x.
I also measured a decrease in memory usage from 4.79 to 4.31 GB.
For multires I observed a similar improvement in memory usage,
though less of a performance improvement.
There are still significant opportunities for future improvement. #122775
would be particularly helpful. #99983 would be helpful too, though more
complicated, and #97665 describes the problems a bit more generally.
Part of #118145.
Pull Request: https://projects.blender.org/blender/blender/pulls/127002
Functional Changes:
- Custom shapes using empties now supports line width.
- Line width is supported on MacOS.
- Fixed Stick bone drawing on MacOS.
Some shaders are duplicated and ported to the new
primitive expansion API.
The legacy code inside `overlay_armature.cc` have been
guarded behind `NO_LEGACY_OVERLAY` which can
be enabled to make sure no legacy code is used unnoticed.
This allows for spotting more easily code that needs to be
ported. Moreover, it is easier to remove this legacy code
when the time comes.
Rel #102179
Pull Request: https://projects.blender.org/blender/blender/pulls/126474