Commit Graph

621 Commits

Author SHA1 Message Date
Clément Foucault
9990273d04 GPU: Change Type enum to use lower case values
This is to help for future resource declaration
using macros.

Rel #137261

Pull Request: https://projects.blender.org/blender/blender/pulls/137367
2025-04-11 22:39:01 +02:00
Josh Belanich
3c70758f00 Fix #137081: Vulkan: Crash during animation playback
A couple of memory leak fixes for the vulkan backend.

We increment the submission_id on render_graphs upon reset. This
triggers cleanup of anything tracked as a VKResourceTracker. Notably
uniform buffers created for push constant fallbacks. This fixes a memory
leak that was accumulating VKUniformBuffers every frame without cleaning
them up.

Reset resource pools when a swapchain image is presented. This ends up
calling vkResetDescriptorPool, freeing up descriptor set resources. This
fixes a memory leak that was accumulate descriptor sets and pools over
time without freeing them.

Pull Request: https://projects.blender.org/blender/blender/pulls/137305
2025-04-11 14:46:35 +02:00
Jeroen Bakker
b65b6febb9 Fix: Vulkan/OpenXR: Use correct data format for CPU transfers
Incorrect data format was selected when using CPU data transfers in
OpenXR. It always used `GPU_DATA_HALF_FLOAT`, also when the swapchains
where `GPU_RGBA8`. This resulted in black screens in release mode, and
asserts in debud mode.

Fixed by selecting the correct data transfer data type based on the
swapchain format.

Co-authored-by: jeroen@blender.org <Jeroen Bakker>
Pull Request: https://projects.blender.org/blender/blender/pulls/137269
2025-04-10 14:22:55 +02:00
Jeroen Bakker
7ecacbc3e6 Vulkan/OpenXR: Support VK_KHR_external_memory_win32
This PR add support to use a win32 handle to perform share render
result with the OpenXR vulkan instance. This is only possible when
the GPU matches. Otherwise a CPU roundtrip will be performed.

Pull Request: https://projects.blender.org/blender/blender/pulls/137093
2025-04-08 15:21:55 +02:00
Jeroen Bakker
22ae59d28d Vulkan: Include Win32 extensions definitions
Includes win32 specific extensions definitions when including
`vk_common.hh`. Inside `gpu_context.cc` vulkan needs to be
included before opengl, otherwise windows 10 builders will
report a warning.

```
[6421/7520] Building CXX object source\blender\gpu\CMakeFiles\bf_gpu.dir\intern\gpu_context.cc.obj
C:\Program Files (x86)\Windows Kits\10\include\10.0.19041.0\shared\minwindef.h(130): warning C4005: 'APIENTRY': macro redefinition
C:\Users\blender\git\blender-vexp\blender.git\lib\windows_x64\epoxy\include\epoxy/gl.h(59): note: see previous definition of 'APIENTRY'
```

Pull Request: https://projects.blender.org/blender/blender/pulls/137134
2025-04-08 14:10:01 +02:00
Jeroen Bakker
6785c5e3b9 Cleanup: Vulkan: incorrect include statement
vk_samplers.hh included itself
2025-04-08 09:35:41 +02:00
Josh Belanich
5cb2b04c5c Fix #130914: Vulkan memory leak while resizing view-port
This PR implements dynamic viewport state for the Vulkan gpu backend.
By doing so, it fixes #130914.

The following high-level changes were made:

1. The pipeline pool no longer uses the viewport and scissor
    states to identify graphics pipelines, only the number of viewports
    and the number of scissors. Graphics pipelines are configured with
    dynamic viewport and scissor states upon construction.
2. The desired viewport and scissor configurations for drawing are set
    in the data of the draw nodes in the render graph.
3. The draw nodes use these viewport and scissors settings in
    `build_commands`. If the viewport and scissor settings have changed
    between nodes, then vkCmdSetViewport and vkCmdSetScissor commands
    are sent to the command buffer.
4. Tests are updated to verify that set_viewport and set_scissor commands
   are executed the correct number of times. (Also note that I needed to
   #136987 in order to avoid skipping some Vulkan tests).

See the attached screencast for verification. The number of graphics pipelines
no longer grow when resizing the viewport.

Pull Request: https://projects.blender.org/blender/blender/pulls/137002
2025-04-07 17:26:13 +02:00
Miguel Pozo
a5ed5dc4bf GPU: Support deferred compilation in ShaderCompilerGeneric
Update the `ShaderCompilerGeneric` to support deferred compilation
using the batch compilation API, so we can get rid of
`drw_manager_shader`.
This approach also allows supporting non-blocking compilation
for static shaders.

This shouldn't cause any behavior changes at the moment, since batch
compilation is not yet used when parallel compilation is disabled.

This adds a `GPUWorker` and a `GPUSecondaryContext` as an easy to use
wrapper for managing secondary GPU contexts.

(Part of #133674)
Pull Request: https://projects.blender.org/blender/blender/pulls/136518
2025-04-07 15:26:25 +02:00
Jeroen Bakker
a46643af0f Vulkan/OpenXR: Add support for VK_KHR_external_memory_fd
Current implementation uses a CPU roundtrip to transfer render result
to the Xr Swapchain. This PR adds support for sharing the render result
on Linux systems by using file descriptors.

To extend this solution to win32 or dx handles can be done by extending
the data transfer modes, register the correct extensions. When not
using the same GPU between Blender and OpenXR the CPU roundtrip
will still be used.

Solution has been validated with monado simulator and seems to be as
fast as OpenGL.

Performance can be improved by using GPU based synchronization.
Current API is limited as we cannot chain the different renders and
swapchains.

Pull Request: https://projects.blender.org/blender/blender/pulls/136933
2025-04-04 16:01:06 +02:00
Clément Foucault
3562433ae7 pyGPU: Deprecate Shader.program getter
This is getting in the way of making the
GPUShader API more threadsafe.

This getter already doesn't work for vulkan
and Metal, and has very limited usage.

Keeping the python function to avoid errors
and display a deprecation warning.

Pull Request: https://projects.blender.org/blender/blender/pulls/136983
2025-04-04 14:23:09 +02:00
Omar Emara
56b0b709ea Compositor: Support GPU OIDN denoising
This patch supports GPU OIDN denoising in the compositor. A new
compositor performance option was added to allow choosing between CPU,
GPU, and Auto device selection. Auto will use whatever the compositor is
using for execution.

The code is two folds, first, denoising code was adapted to use buffers
as opposed to passing in pointers to filters directly, this is needed to
support GPU devices. Second, device creation is now a bit more involved,
it tries to choose the device is being used by the compositor for
execution.

Matching GPU devices is done by choosing the OIDN device that matches
the UUID or LUID of the active GPU platform. We need both UUID and LUID
because not all platforms support both. UUID is supported on all
platforms except MacOS Metal, while LUID is only supported on Window and
MacOS metal.

If there is no active GPU device or matching is unsuccessful, we let
OIDN choose the best device, which is typically the fastest.

To support this case, UUID and LUID identifiers were added to the
GPUPlatformGlobal and are initialized by the GPU backend if supported.
OpenGL now requires GL_EXT_memory_object and GL_EXT_memory_object_win32
to support this use case, but it should function without it.

Pull Request: https://projects.blender.org/blender/blender/pulls/136660
2025-04-04 11:17:08 +02:00
Clément Foucault
f8de6c31bc EEVEE: Move Object ID storage to gbuffer header layer
This allow to store the full object ID inside a `uint32`
buffer. This allows to get the per object data in deferred
passes and avoid to store object data inside the Gbuffer.

This data is only written if needed.

This had to modify the implementation of subpass input
for all backend to be able to bind layered texture.
This currently work because only the layer 0 is bound to the
framebuffer. This is fragile but I don't see a good builtin way
to fix it.

Rel #135935

#### Tasks
- [x] Replace light linking bits in Gbuffer
- [x] Replace Object ID in GBuffer for SSS
- [x] Conditional storage
- [x] Dummy storage if not needed

Pull Request: https://projects.blender.org/blender/blender/pulls/136428
2025-04-03 14:00:55 +02:00
Jeroen Bakker
aed9f22233 Refactor: Vulkan: swapchain
This PR refactors the way how swapchains are used.

Allow scaling of the swapchain content to the actual resolution of the swapchain.
can reduce artefacts when resizing windows when supported.

When frame rate is to fast the previous implementation could use a semaphore
that were still in use, leading to unwanted stuttering on certain platforms. Waiting
when the rendering has finished (GHOST_Frame.submission_fence), before the
next image is acquired from the swap chain.

Mailbox has been disabled as it can calculate more frames then actually been
presented, leading to a lag and increased  power usage on others.

Pull Request: https://projects.blender.org/blender/blender/pulls/136603
2025-04-01 16:01:22 +02:00
Jeroen Bakker
b3c4190cf7 Fix #134928: Vulkan: Out of bounds framebuffer region
When making a minimized window larger Blender can have negative regions.
This leads to out of bound writes when blitting to the framebuffer.

Easy reproducable on NVIDIA/Windows.

Pull Request: https://projects.blender.org/blender/blender/pulls/136832
2025-04-01 15:06:13 +02:00
Campbell Barton
fc8f6ee853 Cleanup: resolve ignored qualifier warning for CLANG 2025-04-01 01:01:38 +00:00
Jeroen Bakker
5e26f5cc2a Vulkan: Reduce lag on certain platforms.
After reviewing the locations where `GPU_flush()` are used it doesn't seem
to be harmfull to include these for the Vulkan backend as well. Hopefully
will save some lag that can happen when submitting one huge render graph.

Improved playback of rain_restaurant.blend where frames could be dropped
resulting into UI lag.

Pull Request: https://projects.blender.org/blender/blender/pulls/136654
2025-03-31 12:16:48 +02:00
Jeroen Bakker
3885a37541 Vulkan: Initial OpenXR support
The Blender's VkInstance cannot be shared with OpenXR VkInstance. The
reason is a chicken and egg problem where OpenXR needs to be started
before Vulkan. OpenXR can add special vulkan specific requirements
(instance&device) that are only available when the user starts an OpenXR
session.

The goal implementation is to share memory between both instances using
[VK_KHR_external_memory](https://registry.khronos.org/vulkan/specs/latest/man/html/VK_KHR_external_memory.html) and related extensions. However this seems
to be a bridge to far as a initial step. Reason: There are not that many
samples/ guides and documentation to be found to handle the workflow that
we require. We want to do a smaller step by step approach to gain the needed
knowledge.

For that reason this PR does the most stupidest thing that can be done to
share memory between instances. Download the render result to CPU RAM share
the host pointer with the OpenXR instance which copies it to the swap chain.
Also the synchronization is done using wait idle commands.

<video src="attachments/32a0d69b-c3fa-4272-aea0-d207609afaaf" title="Screencast From 2025-03-18 11-16-17.webm" controls></video>

**Gaining knowledge**

- Experiment with `VK_KHR_external_memory_host` extension for uploading vertex buffers (not related to OpenXR).
- Import host pointer with `VK_KHR_external_memory_host`. This reduces the additional
  memcpy on OpenXR side.
- Export host pointer from Blender side from a mappable buffer.
- Replace host pointers with fd/dmabuf/winhandle
- Remove mappable buffer.

Ref #133718

Pull Request: https://projects.blender.org/blender/blender/pulls/133824
2025-03-27 16:57:51 +01:00
Jeroen Bakker
d5bef6cb01 Cleanup: Remove unused code 2025-03-27 14:09:15 +01:00
Campbell Barton
42ad772a1f Cleanup: spelling & repeated terms (make check_spelling_*)
Also use comment blocks for English text.
2025-03-27 01:13:34 +00:00
Jeroen Bakker
3c13d14e83 Cleanup: Remove incorrect CPP attribute
Parameter was tagged to be deprecated, but in fact it is not.
2025-03-25 12:47:28 +01:00
Jeroen Bakker
409ce2b976 Vulkan: Swapchain synchronization
This PR adds swapchain synchronization. When the swapchain swaps the
buffers it can add a wait semaphore/signal semaphore to support GPU
based synchronization

10 times playback of `rain_restaurant.blend` on AMD RX 7700
Before: 10 × Animation playback: 72347.5540 ms, average: 7234.75539684 ms
After: 10 × Animation playback: 41523.2441 ms, average: 4152.32441425 ms

Getting around the OpenGL performance target.

Pull Request: https://projects.blender.org/blender/blender/pulls/136259
2025-03-24 10:28:52 +01:00
Jeroen Bakker
a92981e77b Refactor: Vulkan: Move render graph submission into device_submission.cc
Pull Request: https://projects.blender.org/blender/blender/pulls/136257
2025-03-20 15:55:30 +01:00
Jeroen Bakker
4429cc7e84 Fix: Vulkan: Incorrect framebuffer selection
When swap chain is updated the logic could select an incorrect
framebuffer. This isn't actually the case during normal usage, but has
been detected during the development of OpenXR support. Here it did
matter.

Pull Request: https://projects.blender.org/blender/blender/pulls/136115
2025-03-18 11:49:52 +01:00
Jeroen Bakker
5a3fd4522c Fix #135929: Vulkan: Add support for line loops in immediate rendering
Currently only implemented for immediate mode. When used it copies the
first vertex to the last vertex to complete the loop.

Pull Request: https://projects.blender.org/blender/blender/pulls/136083
2025-03-17 15:32:49 +01:00
Jeroen Bakker
c4feddefd7 Refactor: Vulkan: Split VKWorkarounds
VKWorkarounds adds double negation. This PR splits
the struct into workarounds and extensions to reduce
confusing code.

Pull Request: https://projects.blender.org/blender/blender/pulls/136064
2025-03-17 09:06:47 +01:00
Jeroen Bakker
7857d9e3bf Fix: Vulkan: Std430 push constant packing
When using vec3[] as push constants it selected the incorrect
branch resulting in uploading incorrect data to the shader.

This resulted in not seeing the clipping bounds in vulkan.

Ref: #131111
2025-03-13 16:21:28 +01:00
Jeroen Bakker
330583961a Fix: Vulkan: Incorrect background blending
`GPU_BLEND_BACKGROUND` set incorrect blend mode, resulting
in incorrect rendering when activating bordered rendering.

Ref: #131111
2025-03-13 16:21:28 +01:00
Jeroen Bakker
15d88e544a GPU: Storage buffer allocation alignment
Since the introduction of storage buffers in Blender, the calling
code has been responsible for ensuring the buffer meets allocation
requirements. All backends require the allocation size to be divisible
by 16 bytes. Until now, this was sufficient, but with GPU subdivision
changes, an external library must also adhere to these requirements.

For OpenSubdiv (OSD), some buffers are not 16-byte aligned, leading
to potential misallocation. Currently, this is mitigated by allocating
a few extra bytes, but this approach has the drawback of potentially
reading unintended bytes beyond the source buffer.

This PR adopts a similar approach to vertex buffers: the backend handles
extra byte allocation while ensuring data uploads and downloads function
correctly without requiring those additional bytes.

No changes were needed for Metal, as its allocation size is already
aligned to 256 bytes.

**Alternative solutions considered**:

- Copying the CPU buffer to a larger buffer when needed (performance impact).
- Modifying OSD buffers to allocate extra space (requires changes to an external library).
- Implementing GPU_storagebuf_update_sub.

Ref #135873

Pull Request: https://projects.blender.org/blender/blender/pulls/135716
2025-03-13 15:05:16 +01:00
Jeroen Bakker
e1d2eee02b Cleanup: Vulkan: Remove unused variable 2025-03-13 13:31:13 +01:00
Jeroen Bakker
1ea1f4c92c Refactor: GHOST/Vulkan: Wrap handles in a struct
Vulkan handles are currently only requested once. In the future OpenXR
also needs acces to these handles and additional handles will be needed
when introducing copy queues and async compute.

This PR will collect the handles in a struct to ensure we don't need to
alter the GHOST interface for every change.

Pull Request: https://projects.blender.org/blender/blender/pulls/135905
2025-03-13 11:06:20 +01:00
Jeroen Bakker
cdc37b2235 GPU: Add support for GPU_vertbuf_update_sub
`GPU_vertbuf_update_sub` is used by GPU based subdivision to integrate
quads, triangles and edges. This is just an implementation to make it
work as we are planning bigger changes to improve performance of
uploading data to the GPU.

Pull Request: https://projects.blender.org/blender/blender/pulls/135774
2025-03-11 10:14:00 +01:00
Jeroen Bakker
ba22e5e6be Merge branch 'blender-v4.4-release' 2025-03-10 08:49:37 +01:00
Jeroen Bakker
eceb81b21f GPU: Remove RDNA2 shader viewport workaround
It has been confirmed that the latest release of AMD drivers has fixed
issues for both OpenGL and Vulkan. Users should use AMD driver 25.3.1
or later. Removing the workaround as it has performance penalties on
RDNA2 based GPUs.

Reference: #135516
Pull Request: https://projects.blender.org/blender/blender/pulls/135630
2025-03-10 07:22:02 +01:00
Jeroen Bakker
be4f9c0ac8 Merge branch 'blender-v4.4-release' 2025-03-06 16:30:16 +01:00
Jeroen Bakker
37d781aa2a Fix #135516: Vulkan: Shader output viewport broken on RDNA2
When using the official RDNA2 driver +vulkan we see the same issue we
as #123787. Adding the same workaround to vulkan as well.

Pull Request: https://projects.blender.org/blender/blender/pulls/135565
2025-03-06 16:28:47 +01:00
Brecht Van Lommel
3dab100860 Fix: ASAN errors after addition of texture pool
Same fix as #132504. Free the texture pool before the derived GPU context
class, as that one is used as part of freeing the texture pool.

Pull Request: https://projects.blender.org/blender/blender/pulls/135444
2025-03-04 16:54:05 +01:00
Bastien Montagne
318ae49f1e Cleanup: Remove void * handling from MEM_freen<T>.
Followup to 48e26c3afe, and discussions in !134771 about keeping
'C-style' and 'C++ template type-safe style' implementations of our
guardedalloc separated. And it makes `MEM_freeN<T>` code simpler.

Also skip type-checking in `MEM_freeN<T>` only with MSVC, as clang-cl on
windows-arm64 does work fine with DNA structs using
`DNA_DEFINE_CXX_METHODS`.

Pull Request: https://projects.blender.org/blender/blender/pulls/134861
2025-02-20 16:42:22 +01:00
Jeroen Bakker
f89a075015 Merge branch 'blender-v4.4-release' 2025-02-17 08:58:44 +01:00
Jeroen Bakker
0faba244a5 Fix: Vulkan: Async readback of storage buffers
The vulkan backend was implemented with async in mind, however the one place
where Blender uses for async was implemented blocking. This PR splits the
readback into flushing the command and waiting for readback.

**Performance**

Improvement of animation playback performance of shader balls.blend is around 10%.
Shader balls.blend frame: 1-100, 10 x animation playback

| Branch               | Total time | Average time |
| -------------------- | ---------- | ------------ |
| blender-v4.4-release | 26851 ms   | 2685 ms      |
| This PR              | 23675 ms   | 2367 ms      |

Pull Request: https://projects.blender.org/blender/blender/pulls/134227
2025-02-17 08:58:06 +01:00
Clément Foucault
86b70143d5 Cleanup: GPU: Remove unused Transform Feedback implementation
Most of the cleanup is inside the metal backend.

Pull Request: https://projects.blender.org/blender/blender/pulls/134349
2025-02-10 17:30:42 +01:00
Campbell Barton
9154b5d14a Cleanup: correct misleading variable name
Don't mix up the "patch" version and the "subversion".
2025-02-06 10:12:39 +11:00
Jeroen Bakker
3d20d39115 Cleanup: Vulkan: Use is_link_to_buffer
Previous implementation used the resource state tracker which is a hash
table lookup. `is_link_to_buffer` is a bit cheaper as it is compares
already loaded data.
2025-02-04 16:28:46 +01:00
Jeroen Bakker
aa535f1a5f Cleanup: Vulkan: Remove resource locking when reordering nodes
This PR changes the resource locking when reordering render graph
nodes. Reordering could be done without locking resources. No measurable
speedup detected.

Pull Request: https://projects.blender.org/blender/blender/pulls/134032
2025-02-04 13:24:01 +01:00
Jeroen Bakker
7f04a4fef3 Fix: Vulkan: Stalling shader compilation
This PR fixes an issue that shaders compilation could stall. This could
be seen in the viewport (sometime not showing first EEVEE render) but
was more prominent when running test cases.

Pull Request: https://projects.blender.org/blender/blender/pulls/134020
2025-02-04 09:49:28 +01:00
Jeroen Bakker
27b9173081 Cleanup: Code-style
Remove commented out parameter-name from header.
2025-02-03 08:06:10 +01:00
Brecht Van Lommel
c7502b092d Cleanup: Various clang-tidy warnings in gpu
Pull Request: https://projects.blender.org/blender/blender/pulls/133734
2025-01-31 17:03:18 +01:00
Jeroen Bakker
96c9153c5e Fix: Vulkan: Use after free of VkBufferView
VkBufferViews could be used after they were freed. The reason is that
they were not managed by the discard pool. Detected when looking in
failing render tests (pointcloud_motion.blend).

This part of the API is used by motion blur in EEVEE. Fixes the next
render tests

- `eevee_next_motion_blur_vulkan`
- `eevee_next_pointcloud_vulkan`
- `eevee_next_hair_vulkan`

Related: #133546
Pull Request: https://projects.blender.org/blender/blender/pulls/133856
2025-01-31 11:41:39 +01:00
Jeroen Bakker
4dbb9b34c6 Fix: Vulkan: Compositor cryptomatte
When using cryptomatte the last identifier was never used due to a
memory alignment issue. Scalar types should not be aligned, but they
were.

Pull Request: https://projects.blender.org/blender/blender/pulls/133815
2025-01-30 15:51:09 +01:00
Jeroen Bakker
4b99bc8515 Fix: Renderdoc: Corruption in debug stack
In renderdoc the debug stack got corrupted when render graphs where
reused. The previous usage didn't clear the stack. This PR clears
the debug stack when render graphs are reset.
2025-01-30 13:46:21 +01:00
Campbell Barton
bd1ded952b Cleanup: spelling in comments 2025-01-29 12:31:19 +11:00