Vulkan 1.0 render passes have been replaced by dynamic rendering in 1.2.
Blender Vulkan backend was implemented with dynamic rendering in mind.
All our supported platforms support dynamic rendering. Render pass support
was added to try to work around an issue with legacy drivers. However these
drivers also fail with render passes.
Using render passes had several limitations (blending and some workbench
features were not supported). As no GPU uses it and it is quite some code
to support it is better to remove it.
Pull Request: https://projects.blender.org/blender/blender/pulls/144149
Descriptor sets/pools are known to be troublesome as it doesn't match
how GPUs work, or how application want to work, adding more complexity
than needed. This results is quite an overhead allocating and
deallocating descriptor sets.
This PR will use descriptor buffers when they are available. Most platforms
support descriptor buffers. When not available descriptor pools/sets
will be used.
Although this is a feature I would like to land it in 4.5 due to the API changes.
This makes it easier to fix issues when 4.5 is released.
The feature can easily be disabled by setting the feature to false if it has
to many problems.
Pull Request: https://projects.blender.org/blender/blender/pulls/138266
When buffers/images are allocated that use larger limits than supported
by the GPU blender would crash. This PR adds some safety mechanism to
allow Blender to recover from allocation errors.
This has been tested on NVIDIA drivers.
Pull Request: https://projects.blender.org/blender/blender/pulls/139876
These functions are trivial and shouldn't add the cost of a call.
They appeared in profiles, which they shouldn't since they mostly
just return access to member variables. Inlining them reduces
the backend's overhead when sculpting.
Also reserve a Vector before repeated appending.
Pull Request: https://projects.blender.org/blender/blender/pulls/138349
For performance reasons render graphs can keep memory allocated so it
could be reused. This PR optimizes the memory usage inside the
rendergraph to keep it within normal usage.
I didn't detect any performance regression with this change but reduces
the memory when performing final image rendering of heavy scenes.
Partial fix for #137382. the amount of memory still increases with 4mb
per render. It fixes the main difference when using large scenes.
Pull Request: https://projects.blender.org/blender/blender/pulls/137660
A couple of memory leak fixes for the vulkan backend.
We increment the submission_id on render_graphs upon reset. This
triggers cleanup of anything tracked as a VKResourceTracker. Notably
uniform buffers created for push constant fallbacks. This fixes a memory
leak that was accumulating VKUniformBuffers every frame without cleaning
them up.
Reset resource pools when a swapchain image is presented. This ends up
calling vkResetDescriptorPool, freeing up descriptor set resources. This
fixes a memory leak that was accumulate descriptor sets and pools over
time without freeing them.
Pull Request: https://projects.blender.org/blender/blender/pulls/137305
This PR implements dynamic viewport state for the Vulkan gpu backend.
By doing so, it fixes#130914.
The following high-level changes were made:
1. The pipeline pool no longer uses the viewport and scissor
states to identify graphics pipelines, only the number of viewports
and the number of scissors. Graphics pipelines are configured with
dynamic viewport and scissor states upon construction.
2. The desired viewport and scissor configurations for drawing are set
in the data of the draw nodes in the render graph.
3. The draw nodes use these viewport and scissors settings in
`build_commands`. If the viewport and scissor settings have changed
between nodes, then vkCmdSetViewport and vkCmdSetScissor commands
are sent to the command buffer.
4. Tests are updated to verify that set_viewport and set_scissor commands
are executed the correct number of times. (Also note that I needed to
#136987 in order to avoid skipping some Vulkan tests).
See the attached screencast for verification. The number of graphics pipelines
no longer grow when resizing the viewport.
Pull Request: https://projects.blender.org/blender/blender/pulls/137002
Followup to 48e26c3afe, and discussions in !134771 about keeping
'C-style' and 'C++ template type-safe style' implementations of our
guardedalloc separated. And it makes `MEM_freeN<T>` code simpler.
Also skip type-checking in `MEM_freeN<T>` only with MSVC, as clang-cl on
windows-arm64 does work fine with DNA structs using
`DNA_DEFINE_CXX_METHODS`.
Pull Request: https://projects.blender.org/blender/blender/pulls/134861
Previous implementation used the resource state tracker which is a hash
table lookup. `is_link_to_buffer` is a bit cheaper as it is compares
already loaded data.
This PR changes the resource locking when reordering render graph
nodes. Reordering could be done without locking resources. No measurable
speedup detected.
Pull Request: https://projects.blender.org/blender/blender/pulls/134032
In renderdoc the debug stack got corrupted when render graphs where
reused. The previous usage didn't clear the stack. This PR clears
the debug stack when render graphs are reset.
This PR implements a new the threading model for building render graphs
based on tests performed last month. For out workload multithreaded
command building will block in the driver or device. So better to use a
single thread for command building.
Details of the internal working is documented at https://developer.blender.org/docs/features/gpu/vulkan/render_graph/
- When a context is activated on a thread the context asks for a
render graph it can use by calling `VKDevice::render_graph_new`.
- Parts of the GPU backend that requires GPU commands will add a
specific render graph node to the render graph. The nodes also
contains a reference to all resources it needs including the
access it needs and the image layout.
- When the context is flushed the render graph is submitted to the
device by calling `VKDevice::render_graph_submit`.
- The device puts the render graph in `VKDevice::submission_pool`.
- There is a single background thread that gets the next render
graph to send to the GPU (`VKDevice::submission_runner`).
- Reorder the commands of the render graph to comply with Vulkan
specific command order rules and reducing possible bottlenecks.
(`VKScheduler`)
- Generate the required barriers `VKCommandBuilder::groups_extract_barriers`.
This is a separate step to reduce resource locking giving other
threads access to the resource states when they are building
the render graph nodes.
- GPU commands and pipeline barriers are recorded to a VkCommandBuffer.
(`VKCommandBuilder::record_commands`)
- When completed the command buffer can be submitted to the device
queue. `vkQueueSubmit`
- Render graphs that have been submitted can be reused by a next
thread. This is done by pushing the render graph to the
`VKDevice::unused_render_graphs` queue.
Pull Request: https://projects.blender.org/blender/blender/pulls/132681
VKRenderGraphNode is 892 bytes and most of the bytes are used for
specific nodes. By storing large structs in separate vectors we can
reduce the needed memory and improve cache pre-fetching.
With this change the VKRenderGraphNode is reduced to 64 bytes.
On a (50 frames shader_balls.blend) the end user performance is improved by
2%.
| **Platform** | **Before** | **After** |
| ---------------- | ---------- | --------- |
| AMD W7700 | 1409 ms | 1383 ms |
| NVIDIA RTX 6000 | 1443 ms | 1428 ms |
Pull Request: https://projects.blender.org/blender/blender/pulls/133317
Images used to be tracked with ownership in order to reset swap chain
images to its original layout. This isn't used anymore as we always mark
them in VK_IMAGE_LAYOUT_UNDEFINED to make the first pipeline barrier a
nop.
This change reduces unneeded complexity and safe a few CPU cycles.
Pull Request: https://projects.blender.org/blender/blender/pulls/133197
Initial design had a more complex use case for render graphs.
They are not really used and will not in the near term. This PR
removes some code that doesn't do a thing
Pull Request: https://projects.blender.org/blender/blender/pulls/133047
Pipeline barriers were extracted when recording commands. This works,
but had the downside that it locked the device resources. Extracting
pipeline barriers is fairly small task compared to recording commands.
This PR will perform the extraction of pipelines separate from command
recording. Code is easier to follow and when working with multiple threads
this will reduce locking (enabling this will be done in separate PR).
Original developed in !131965
Pull Request: https://projects.blender.org/blender/blender/pulls/132989
This will add support for `VK_KHR_dynamic_rendering_local_read` when supported.
The extension allows reading from an attachment that has been written to by a
previous command.
Per platform optimizations still need to happen in future changes. Change will
be limited to Qualcomm devices (in a future commit).
On Qualcomm devices this provides an uplift of 16% when using shader_balls.blend
Pull Request: https://projects.blender.org/blender/blender/pulls/131053
Reduces the number of times a graphic context needs to be paused/resumed.
The scheduler reorders the nodes to put these initial data transfer nodes
to the start of the nodes that are about to be submitted.
Pull Request: https://projects.blender.org/blender/blender/pulls/131502
When copying from/to depth stencil images the aspect of the image was
set incorrectly. It pointed to only one aspect of the full image when
building resource links, which could lead to incorrect decisions in the
driver.
Found when researching #131269
Pull Request: https://projects.blender.org/blender/blender/pulls/131298
When lighting baking is used in a background render the resources are
freed to early. The cause is that light baking does some initialization
within a context, that isn't send to the GPU. The first iteration of
light baking is expecting that it can free resources, what leads to GPU
resources to be deleted that are still used by commands that are
scheduled to be send to the GPU.
This PR fixes this by using multiple resource pools when background
rendering and ensure that contexts are send to the GPU when rendering
ends.
Pull Request: https://projects.blender.org/blender/blender/pulls/131094
When debugging render graph the debug group name can narrow down the
place where a node originates from. This PR adds a function to retrieve
the full debug group name of a specific node.
Pull Request: https://projects.blender.org/blender/blender/pulls/131081
Layer tracking allows modifying specific layers of a bound texture to a
different layout. This was only supported when suspending/resuming was
not needed. However when using complex scenes EEVEE can trigger suspend/
resume rendering scopes. This resulted into several validation warnings
as images where in the incorrect state.
Fixes validation warnings:
- rain_restaurant.blend
- classroom.blend
Pull Request: https://projects.blender.org/blender/blender/pulls/130957
Currently `VkCommandBuffer` are allocated with `vkAllocateCommandBuffers`,
however when the commands are submitted pointer are just reset, these leaks
are visible with just play back animations.
This frees commands buffers resolving the leaks.
Ref: #127225 Although related we should check the result as there could be more causes.
Pull Request: https://projects.blender.org/blender/blender/pulls/130203
Copying editors to the swap chain is done by a series of draw and copy.
When doing draw, copy, draw the swap chain layout was not matching the
draw command as it resumed previous rendering.
This is solved by validating the pipeline barriers when resuming rendering.
There are also other cases that required this which have been updated.
Pull Request: https://projects.blender.org/blender/blender/pulls/130721
Dynamic rendering is a Vulkan 1.3 feature. Most platforms have support
for them, but there are several legacy platforms that don't support dynamic
rendering or have driver bugs that don't allow us to use it.
This change will make dynamic rendering optional allowing legacy
platforms to use Vulkan.
**Limitations**
`GPU_LOADACTION_CLEAR` is implemented as clear attachments.
Render passes do support load clear, but adding support to it would
add complexity as it required multiple pipeline variations to support
suspend/resume rendering. It isn't clear when which variation should
be used what lead to compiling to many pipelines and branches in the
codebase. Using clear attachments doesn't require the complexity
for what is expected to be only used by platforms not supported by
the GPU vendors.
Subpass inputs and dual source blending are not supported as
Subpass inputs can alter the exact binding location of attachments.
Fixing this would add code complexity that is not used.
Ref: #129063
**Current state**

Pull Request: https://projects.blender.org/blender/blender/pulls/129062
WITH_VULKAN_GUARDEDALLOC is a development option to use Blenders guarded
allocator when allocating internal vulkan driver resources. It does not provide any benefits
as this should be covered by vulkan validation and drivers are often ignoring this. This
change will remove the option from cmake and source code.
Pull Request: https://projects.blender.org/blender/blender/pulls/129039
Cycles uses multiple threads to send commands to the GPU. The current
command buffer structure assumed that all commands from the same context
were send via the same thread. This wasn't the case and could lead to
recording commands to command buffers that are still pending (preparing
commands to send to GPU).
This is fixed by creating a command buffer each time a render graph
submits its work.
Detected when researching #128608
Pull Request: https://projects.blender.org/blender/blender/pulls/128978
When using cycles in the viewport there it uses render threads and
workers to update the viewport. All threads can record commands to the
queue and needs to be synchronized. This didn't happen leading to
incorrect renders.
Detected when researching #128608
Pull Request: https://projects.blender.org/blender/blender/pulls/128975
Multiple threads can access the same device queue from different
threads. This could happen when doing a cycles preview render, baking
eevee volume probes or generating material previews.
This PR adds a mutex around access to the device queues.
Detected when researching #128608
Pull Request: https://projects.blender.org/blender/blender/pulls/128974
Related to #126974, which removed command reordering due to some
EEVEE/framebuffer requirements. However buffer can still be reordered
without any artifacts.
Update buffers are common operations and are often isolated; safe to
move them outside the rendering scope.
Pull Request: https://projects.blender.org/blender/blender/pulls/128538
This change adds the option to update a buffer via the render
graph via `vkCmdUpdateBuffer`. This is only enabled for
uniform buffers as they are small and aligned/sized correctly.
Pull Request: https://projects.blender.org/blender/blender/pulls/128416
Solved by not supporting complex reordering at this moment. It
is currently better to focus on quality and add back performance
later. During tests I didn't detect any user noticeable performance
degradation. Could also be because we support rendering
suspending/resuming.
Fixes artifacts in EEVEE volumes and raytracing.
Pull Request: https://projects.blender.org/blender/blender/pulls/126974
Fix crash when using EEVEE irradiance baking. When reading back the
intermediate result the active rendering was not ended, resulting
in an assert as the rendergraph is cleared and assumed to be in an
initial state (not rendering).
Pull Request: https://projects.blender.org/blender/blender/pulls/126688