Followup to 9b70851d91.
Return buffers by value rather than creating an empty/uninitialized
buffer first, then initializing it in an extraction function. This generally
makes the code easier to follow. And avoiding these half-created buffers
is an essential step to adding some sort of more global cache.
Pull Request: https://projects.blender.org/blender/blender/pulls/136570
The initial goal of this PR is to avoid creating vertex and index
buffers as part of the "request" phase of the drawing loop. Conflating
requesting and creating index buffers might not sound so bad, but it
ends up significantly complicating the whole process. It is also
incompatible with a future buffer cache that would allow avoiding
re-uploading mesh buffers.
Specifically, this means removing the use of `DRW_vbo_request` and
`DRW_ibo_request` from the mesh batch extraction process. Instead, a
list of buffer types is gathered based on the requested batches. Then
that list is filtered to find the batches that haven't been requested
yet. Overall I find the new process much easier to understand.
A few examples of simplifications this allows are avoiding allocating
`MeshRenderData` on the heap, and the removal of its `use_final_mesh`
member. That's just replaced by passing the necessary information
through the call stack.
Another notable difference is that for meshes, EEVEE's velocity module
now requests a batch that contains the buffer rather than just requesting
the buffer itself. This is just simpler to get working since it doesn't require
a separate code path.
The task graph argument for extraction is unused after this change. It wasn't
used effectively anyway; a simpler method of multithreading extractions is
used in this PR. I didn't remove it completely because it will probably be
repurposed in the next step of this project.
The next step in this project is to replace `MeshBufferList` with a
global cache that's keyed based on the mesh data that compromises each
batch, when possible (i.e. for non edit-mode meshes). This changes above
should be applied to other object types too.
Pull Request: https://projects.blender.org/blender/blender/pulls/135699
This refactor part of `draw_manager_c.cc` to make it more understandable
and less bug prone.
- Splits the context handing to `draw_gpu_context.cc`
- Rename `draw_manager_c.cc` to `draw_context.cc`
- Merge `DRWContextState` into `DRWContext`
- Merge lots of static functions into `DRWContext` to avoid global access
- Deduplicate code between entry point functions
- Move context init logic to `DRWContext` constructor
- Move resource init logic to `DRWContext::acquire_data`
- Move extraction `TaskGraph` out of `DRWContext`
- Reduce / centralize complexity of enabling draw engines
- Reduce the amount of `drw_get` calls
- Remove unused code
Pull Request: https://projects.blender.org/blender/blender/pulls/135821
Blender already had its own copy of OpenSubDiv containing some local fixes
and code-style. This code still used gl-calls. This PR updates the calls
to use GPU module. This allows us to use OpenSubDiv to be usable on other
backends as well.
This PR was tested on OpenGL, Vulkan and Metal. Metal can be enabled,
but Vulkan requires some API changes to work with loose geometry.

# Considerations
**ShaderCreateInfo**
intern/opensubdiv now requires access to GPU module. This to create buffers
in the correct context and trigger correct dispatches. ShaderCreateInfo is used
to construct the shader for cross compilation to Metal/Vulkan. However opensubdiv
shader caching structures are still used.
**Vertex buffers vs storage buffers**
Implementation tries to keep as close to the original OSD implementation. If
they used storage buffers for data, we will use GPUStorageBuf. If it uses vertex
buffers, we will use gpu::VertBuf.
**Evaluator const**
The evaluator cannot be const anymore as the GPU module API only allows
updating SSBOs when constructing. API could be improved to support updating
SSBOs.
Current implementation has a change to use reads out of bounds when constructing
SSBOs. An API change is in the planning to remove this issue. This will be fixed in
an upcoming PR. We wanted to land this PR as the visibility of the issue is not
common and multiple other changes rely on this PR to land.
Pull Request: https://projects.blender.org/blender/blender/pulls/135296
`OpenSubdiv_Buffer` is a wrapper that was introduced at the time
that Blender couldn't use CPP directly. It contains a pointer to
a VertBuf and callbacks to use GPU module on that buffer.
This PR replaces OpenSubdiv_Buffer with `blender::gpu::VertBuf` and
removes the wrapper.
NOTE: OpenSubdiv tests are added to blender_test executable to make the
library dependencies not to complicated.
Pull Request: https://projects.blender.org/blender/blender/pulls/135389
This PR migrates the custom_data_interp_comp.glsl to use
shader create info.
During development tests have been conducted to use specialization constants,
but due to limitations inside Metal we didn't use them.
Number of ShaderCreateInfos have been reduced by using macros. Variadic macros
have not been used as they don't support CPP compilation.
Pull Request: https://projects.blender.org/blender/blender/pulls/134932
In this case, the evaluator cache was never referenced and
the subdiv free queue empty. So the freeing of the cache
never happened.
This function is called once per frame and is unlikely
to generate overhead by doing one lock.
This was caused by 91de4a50ab refactor which replaced
the evaluator cache singleton by local variable.
It was unknown that the evaluators in the cache
are actually referenced by the modifier data.
To fix this and fix the thread-unsafety of the global
variable, a mutex is introced around a reduced critical
section inside `draw_subdiv_create_requested_buffers`.
The global evaluator cache is now also refcounted to allow
freeing of the cache when no evaluator is referenced
anymore.
Pull Request: https://projects.blender.org/blender/blender/pulls/134926
This change migrates the first 2 subdiv shaders to use the ShaderCreateInfo.
Other shaders will follow in separate PRs.
- Should compile when using `WITH_GPU_SHADER_CPP_COMPILATION`
- A `subdiv_` prefix is added only to the functions related to `PosNorLoop`.
But eventually the prefix should also be added to other lib functions.
- Due to Metal restrictions `subdiv_set_vertex_*` is implemented using a
functional paradigma. Our Metal backend only supports `inout` qualifier
on thead local data structures.
Pull Request: https://projects.blender.org/blender/blender/pulls/134218
Subdivision had its own store of shaders. Best to move them to
`draw_shader.cc` where all draw manager related shaders are stored.
Includes some small tweaks:
- Use enum class for shader types
- patch evaluation must now be retrieved via the
`DRW_shader_subdiv_get`. Previously there were 2 ways to retrieve
them, and one didn't support all the variations.
- Use strongly types when possible (`GPUVertCompType`).
Pull Request: https://projects.blender.org/blender/blender/pulls/134213
Especially through DRW_render.hh, there were a lot of unnecessary
includes almost everywhere in the module. This typically makes
dependencies less explicit and slows down compile times, so switch
to including what files actual use.
Pull Request: https://projects.blender.org/blender/blender/pulls/133450
Move `CD_CUSTOMLOOPNORMAL` to the newly added
`CD_PROP_INT16_2D` generic attribute type. This is similar to
previous commits moving specific custom data types.
The attribute name is `custom_normal`. When the attribute with
that name is on the face corner domain, the code will interpret it
as stored in the existing deformation-invariant spherical coordinate
space.
The API remains the same, with the additional opportunity to edit
custom normal data as an attribute directly (which admittedly is fairly
unintuitive currently).
See #130484.
Pull Request: https://projects.blender.org/blender/blender/pulls/130689
Avoid measuring the length of strings repeatedly by passing their
length along with their data with `StringRefNull`. Null termination
seems to be necessary still for passing the shader sources to OpenGL.
Though I doubt this is a bottleneck, it's still nice to avoid overhead from
string operations and this helps move in that direction.
Pull Request: https://projects.blender.org/blender/blender/pulls/127702
Similar to 5e46e3d28a.
This commit replaces the C-API version of `OpenSubdiv_Evaluator`
with direct calls to `EvalOutputAPI`. This removes a level of indirection,
theoretically reducing function call overhead, but also making the whole
system easier to understand and easier to modify. The downside is
further spread of `WITH_OPENSUBDIV` into the code, but I think that
can be improved in the future relatively easily once more of this sort
of change is finished.
Pull Request: https://projects.blender.org/blender/blender/pulls/128278
Remove the indirection previously used for the topology refiner
to separate C and C++ code. Instead retrieve the base level in
calling code and call opensubdiv API functions directly. This
avoids copying arrays of mesh indices and should reduce
function call overhead since index retrieval can now be inlined.
It also lets us remove a lot of boilerplate shim code.
The downside is increased need for WITH_OPENSUBDIV defines
in various parts of blenkernel, but I think that is required to avoid
the previous indirection and have the kernel deal with OpenSubdiv
more directly.
Pull Request: https://projects.blender.org/blender/blender/pulls/120825
Because the previous fix stopped creating these VBOs when they
would be empty we need more null checks. Alternatively still
creating them but not binding them might be a better solution
but just adding null checks seems like the simpler approach
right now.
Pull Request: https://projects.blender.org/blender/blender/pulls/126073
There were two issues. One was that the normals VBO wasn't created
with the correct size. The other was that there were empty VBOs created
which can apparently also cause crashes.
Pull Request: https://projects.blender.org/blender/blender/pulls/126041
Add a `.data<T>()` method that retrieves a mutable span. This is useful
more and more as we change to filling in vertex buffer data arrays
directly, and compared to raw pointers it's safer too because of asserts
in debug builds.
Pull Request: https://projects.blender.org/blender/blender/pulls/123338
The main change is avoid storage of redundant data in the subdivision
draw cache, mainly by replacing reverse lookup from subdivided edge to
coarse edge. This way loops are structured as iteration over coarse
edges instead of iteration over subdivided edges with optional behavior
for vertices with matching base mesh faces. With that inversion the
information in the draw cache is trivial (or duplicated from an array
in `MeshRenderData`), so it's all removed, except for the subdivided
loose edge positions. That array is also shrunk though, by not
duplicating positions in between each subdivided edge. Its calculation
is more efficient for the same reason too.
Overall, besides code simplification, the effect should be lower
overhead with loose edges with GPU subdivision. Admittedly this isn't
a very important use case, but it's part of a general refactor trying
to use better data oriented design in this area (#116901).
Pull Request: https://projects.blender.org/blender/blender/pulls/122071
Implements another phase of #116901, this time for the `lines` and
`lines_loose` index buffers that store indices for wireframe drawing.
The key improvement is removing loose edge's dependency on the main
edge index buffer. That means for the majority of meshes with no loose
edges, edge index extraction can be completely skipped. Even when there
are loose edges, only the loose edges need to be extracted with
wireframe turned off.
Besides that improvement, there are more changes to use data-oriented
code with visible hot loops instead of the virtual function call design
used for the existing mesh extractor system. For this step I completely
replaced the `extract_lines` object, which is in line with the general
plan for this area.
Additionally, hidden edge filtering is done ahead of time using several
`IndexMask` operations. This means only indices for visible edges need
to be uploaded to the GPU, and no restart index stripping needs to be
performed on macOS.
On my usual test file with 1.9 million vertices, I observed an
improvement from 26 to 33 FPS with wireframe off, and from 9.15 to 9.5
FPS with wireframe on.
Pull Request: https://projects.blender.org/blender/blender/pulls/120720