Previous implementation used the resource state tracker which is a hash
table lookup. `is_link_to_buffer` is a bit cheaper as it is compares
already loaded data.
This PR changes the resource locking when reordering render graph
nodes. Reordering could be done without locking resources. No measurable
speedup detected.
Pull Request: https://projects.blender.org/blender/blender/pulls/134032
This PR fixes an issue that shaders compilation could stall. This could
be seen in the viewport (sometime not showing first EEVEE render) but
was more prominent when running test cases.
Pull Request: https://projects.blender.org/blender/blender/pulls/134020
VkBufferViews could be used after they were freed. The reason is that
they were not managed by the discard pool. Detected when looking in
failing render tests (pointcloud_motion.blend).
This part of the API is used by motion blur in EEVEE. Fixes the next
render tests
- `eevee_next_motion_blur_vulkan`
- `eevee_next_pointcloud_vulkan`
- `eevee_next_hair_vulkan`
Related: #133546
Pull Request: https://projects.blender.org/blender/blender/pulls/133856
In renderdoc the debug stack got corrupted when render graphs where
reused. The previous usage didn't clear the stack. This PR clears
the debug stack when render graphs are reset.
This PR implements a new the threading model for building render graphs
based on tests performed last month. For out workload multithreaded
command building will block in the driver or device. So better to use a
single thread for command building.
Details of the internal working is documented at https://developer.blender.org/docs/features/gpu/vulkan/render_graph/
- When a context is activated on a thread the context asks for a
render graph it can use by calling `VKDevice::render_graph_new`.
- Parts of the GPU backend that requires GPU commands will add a
specific render graph node to the render graph. The nodes also
contains a reference to all resources it needs including the
access it needs and the image layout.
- When the context is flushed the render graph is submitted to the
device by calling `VKDevice::render_graph_submit`.
- The device puts the render graph in `VKDevice::submission_pool`.
- There is a single background thread that gets the next render
graph to send to the GPU (`VKDevice::submission_runner`).
- Reorder the commands of the render graph to comply with Vulkan
specific command order rules and reducing possible bottlenecks.
(`VKScheduler`)
- Generate the required barriers `VKCommandBuilder::groups_extract_barriers`.
This is a separate step to reduce resource locking giving other
threads access to the resource states when they are building
the render graph nodes.
- GPU commands and pipeline barriers are recorded to a VkCommandBuffer.
(`VKCommandBuilder::record_commands`)
- When completed the command buffer can be submitted to the device
queue. `vkQueueSubmit`
- Render graphs that have been submitted can be reused by a next
thread. This is done by pushing the render graph to the
`VKDevice::unused_render_graphs` queue.
Pull Request: https://projects.blender.org/blender/blender/pulls/132681
Vulkan shader compiler accesses the cache folder via multiple threads.
GHOST part isn't thread safe and can return and overwrite the returned
cache path. This resulted into crashes when performing background
rendering and failing test cases, loading of incorrect shaders etc.
This PR fixes this to cache the cache folder location in the
VKShaderCompiler, which is loaded via the main thread when the vulkan
backend is initialized.
Pull Request: https://projects.blender.org/blender/blender/pulls/133535
Memory areas was requested to be preferable host visible. On some
platforms this would fail to allocate. Best is to not add preferable
host visible for typically large allocations.
This PR also gives the caller the responsibility to set the allocation flags.
Pull Request: https://projects.blender.org/blender/blender/pulls/133528
Cycles uses pixel buffers to update the display. Due to making things
work the vulkan backend downloaded the GPU allocated pixel buffer to the
CPU, Copied it to a GPU allocated staging buffer and update the display
texture using the staging buffer. Needless to say that a (CPU->)GPU->CPU->GPU
roundtrip is a bottleneck.
This PR fixes this by allowing the pixel buffer to act as a staging
buffer as well.
Viewport and final image rendering performance is now also similar.
| **Render** | **GPU Backend** | **Path tracing** | **Display** |
| ---------- | --------------- | ---------------- | ----------- |
| Viewport | OpenGL | 2.7 | 0.06 |
| Viewport | Vulkan | 2.7 | 0.04 |
| Image | OpenGL | 3.9 | 0.02 |
| Image | Vulkan | 3.9 | 0.02 |
Tested on:
```
Operating system: Linux-6.8.0-49-generic-x86_64-with-glibc2.39 64 Bits, X11 UI
Graphics card: AMD Radeon Pro W7700 (RADV NAVI32) Advanced Micro Devices radv Mesa 24.3.1 - kisak-mesa PPA Vulkan Backend
```
Pull Request: https://projects.blender.org/blender/blender/pulls/133485
VKRenderGraphNode is 892 bytes and most of the bytes are used for
specific nodes. By storing large structs in separate vectors we can
reduce the needed memory and improve cache pre-fetching.
With this change the VKRenderGraphNode is reduced to 64 bytes.
On a (50 frames shader_balls.blend) the end user performance is improved by
2%.
| **Platform** | **Before** | **After** |
| ---------------- | ---------- | --------- |
| AMD W7700 | 1409 ms | 1383 ms |
| NVIDIA RTX 6000 | 1443 ms | 1428 ms |
Pull Request: https://projects.blender.org/blender/blender/pulls/133317
Images used to be tracked with ownership in order to reset swap chain
images to its original layout. This isn't used anymore as we always mark
them in VK_IMAGE_LAYOUT_UNDEFINED to make the first pipeline barrier a
nop.
This change reduces unneeded complexity and safe a few CPU cycles.
Pull Request: https://projects.blender.org/blender/blender/pulls/133197
This fixes a rendering issue when local read enabled.
Before fix, the output image is too bright. This is due to incorrect load/store.
With this fix, the logic for attachment load/store ops with local_read on matches the logic with local_read off inside subpass_transition_impl...
Pull Request: https://projects.blender.org/blender/blender/pulls/133111
Blender always updates all pixels of the swap chain. As an optimization
we can skip the initial layout transition from present to transfer
destination as all pixels will be rewritten.
Pull Request: https://projects.blender.org/blender/blender/pulls/133061
Swizzling is supported when sampling. Outside samplers the swizzling
must always be the initial swizzling.
Detected when playing rain_restaurant.blend. EEVEE motion vectors use
swizzling.
Pull Request: https://projects.blender.org/blender/blender/pulls/133043
Initial design had a more complex use case for render graphs.
They are not really used and will not in the near term. This PR
removes some code that doesn't do a thing
Pull Request: https://projects.blender.org/blender/blender/pulls/133047
Pipeline barriers were extracted when recording commands. This works,
but had the downside that it locked the device resources. Extracting
pipeline barriers is fairly small task compared to recording commands.
This PR will perform the extraction of pipelines separate from command
recording. Code is easier to follow and when working with multiple threads
this will reduce locking (enabling this will be done in separate PR).
Original developed in !131965
Pull Request: https://projects.blender.org/blender/blender/pulls/132989
Only enable by default dynamic rendering local read on Qualcomm devices. NVIDIA, AMD and Intel
performance is better when disabled (20%). On Qualcomm devices the improvement can be
substantial (16% on shader_balls.blend).
`--debug-gpu-vulkan-local-read` can be used to use dynamic rendering local read on any
supported platform.
Future: Check if bottleneck is during command building. If so we could fine-tune this after the
device command building landed (#T132682).
Pull Request: https://projects.blender.org/blender/blender/pulls/132981
This will add support for `VK_KHR_dynamic_rendering_local_read` when supported.
The extension allows reading from an attachment that has been written to by a
previous command.
Per platform optimizations still need to happen in future changes. Change will
be limited to Qualcomm devices (in a future commit).
On Qualcomm devices this provides an uplift of 16% when using shader_balls.blend
Pull Request: https://projects.blender.org/blender/blender/pulls/131053
Selection shader changes the provoking vertex for edge selection.
Using a non default provoking vertex was not implemented in Vulkan
resulting in selecting a different edge then expected.
Pull Request: https://projects.blender.org/blender/blender/pulls/132729
This PR will add timeline semaphores to be required. It doesn't use
the timeline semaphores yet, but as multiple developments will
rely on it it is better to add the requirement.
Pull Request: https://projects.blender.org/blender/blender/pulls/132683
Framebuffers are getting freed in the GPUContext base class destructor. But
the framebuffer destructors use the MTL/VK/GLContext derived class, whose
destructor has already completed at this point. So these contexts are no
longer valid to use.
Now free the framebuffers earlier.
This caused ASAN warnings, it's not known to cause actual bugs.
Pull Request: https://projects.blender.org/blender/blender/pulls/132504
- Using a main function allows the scripts to be imported without
executing logic.
- Declaring `__all__` lets tools such as "vulture" detect unused code.
Ensure `gl_ViewportIndex` and `gl_Layer` are properly forwarded from the
geometry shader, and don't write to them from the vertex shader if
there's a geometry shader stage.
Fixes the Displacement "dicing" render tests on Nvidia OpenGL.
Pull Request: https://projects.blender.org/blender/blender/pulls/131875
Blender stores all pipelines in a pool. Using a hash it checks if a
the pipeline was already created and the previous could be reused. Due
to performance issues when working with graphics pipelines some equal
operations only used a hash check. For scissors and viewports this isn't
enough and could lead to issues.
This PR fixes this to still perform an exact check if the hash are
equal. Note that the performance drops a bit. And should be countered
with other performance improvements in the future.
Pull Request: https://projects.blender.org/blender/blender/pulls/132005
This adds initial support for ReBAR capable platforms.
It ensures that when allocating buffers that should not be host visible, still
tries to allocate in host visible memory. When there is space in this memory
heap the buffer will be automatically mapped to host memory.
When mapped staging buffers can be skipped when the buffer was newly
created. In order to make better usage of ReBAR the `VKBuffer::create`
function will need to be revisit. It currently hides to much options to allocate
in the correct memory heap. This change isn't part of this PR.
Using shader_balls.blend rendering the first 50 frames in main takes 1516ms.
When using ReBAR it takes 1416ms.
```
Operating system: Linux-6.8.0-49-generic-x86_64-with-glibc2.39 64 Bits, X11 UI
Graphics card: AMD Radeon Pro W7700 (RADV NAVI32) Advanced Micro Devices radv Mesa 24.3.1 - kisak-mesa PPA Vulkan Backend
```
Pull Request: https://projects.blender.org/blender/blender/pulls/131856
MoltenVK dynamic rendering is built on top of render passes and framebuffers.
This also means that dynamic rendering has the same limitations.
This PR enables the workarounds for gaps between attachments for MoltenVK.
Pull Request: https://projects.blender.org/blender/blender/pulls/131816
Rendering animations from Python scripts via `bpy.ops.render.opengl()`
did not trigger any of the notifications in the Metal back-end to
indicate a frame had been rendered and that the associated resources
could be released. This adds a call to GPU_render_step() after each
render. For the original asset in the bug report this reduces the high
memory watermark from 30gb to 13gb for 500 frames. 13gb is likely
still too high and therefore it is likely there are additional leaks
that need to be addressed so this should only be considered a partial
fix.
Authored by Apple: James McCarthy
Co-authored-by: James McCarthy <jamesmccarthy@apple.com>
Co-authored-by: Clément Foucault <foucault.clem@gmail.com>
Pull Request: https://projects.blender.org/blender/blender/pulls/131085
This enables moving elements form one vector to another.
Usually this is being doing by extending a vector with the content
from the secondary vector and then clearing the secondary vector.
However sometimes this being performed to transfer ownership of managed elements,
if elements are copied from the secondary vector, but not cleared, this could
lead to 2 vectors to share ownership of objects.
Pull Request: https://projects.blender.org/blender/blender/pulls/131560
NVIDIA driver before 550 doesn't work as expected on Linux. The issue is
known on the internet, but no real solutions are provided.
This change will limit the block list of older driver to the Linux
platform only. It has been reported that Windows these driver are
working and would enable older GPUs to work on Windows.
Ref #129160
Pull Request: https://projects.blender.org/blender/blender/pulls/131674
Vulkan version 1.2 supports Workgroup execution model. Vulkan 1.3
introduced the LocalSizeId execution model and has been backported in
the VK_KHR_maintenance4 extension. In future SPIR-V versions the
Workgroup execution model will be deprecated.
This PR checks the availability of VK_KHR_maintenance4 extension and
when enabled compile the shaders towards Vulkan 1.3. This would
automatically use the LocalSizeId execution model.
See https://registry.khronos.org/vulkan/specs/latest/man/html/WorkgroupSize.html
Pull Request: https://projects.blender.org/blender/blender/pulls/131663