These tests were hitting an assert about
invalid format. They were testing for this
legacy format we don't support anymore.
# Conflicts:
# source/blender/gpu/vulkan/tests/vk_data_conversion_test.cc
Caused by 617858e453.
These formats should use types aligned to 4 bytes. That's generally
required by modern GPUs. Uploading with these types also avoids
automatic conversion by the Vulkan backend which is something
we're hoping to remove fully.
In the end this PR removes a bunch of code related to supporting
the older single-byte formats.
Pull Request: https://projects.blender.org/blender/blender/pulls/138836
Fixes a few render synchronization soundness issues I discovered when investigating
why resetting the pools after the submission fence in #137305 ended up causing #137395.
1. Ensure the number of submission fences in GHOST_ContextVK is the same
as the number of resource pools in VKThreadData. Otherwise, the
fences might get misaligned, making it difficult to predict which
submission a fence's signal corresponds to.
2. In swapBuffers, pass the current m_frame_data's fence and semaphores
to the Vulkan backend callbacks, rather than the following
m_frame_data. This fixes an observed soundness issue where the fences
were off-by-one. Now, the backend callbacks can be sure that both the
next frame is ready for construction (and it's resources can be
cleaned), and will use the correctly aligned fences and semaphores
during command buffer submission.
3. Do not recreate the m_frame_data's fences during recreateSwapchain.
This would lead to unsoundness immediately following
recreateSwapchain where all the fences are signaled but those frames
might still be in flight.
Pull Request: https://projects.blender.org/blender/blender/pulls/137580
When using motion blur GPU materials and its resources can be freed when
still in use. This fix adds a workaround to store these resources
temporarily in a render discard pile. When rendering is finished (or
between frames) the resources are moved to the regular discard pile.
Pull Request: https://projects.blender.org/blender/blender/pulls/138809
Due to an oversight in the pipeline pool/shader, pipelines that were
invalid (as subresources where destroyed) could still be selected for
reused. This happened more often when using the compositor as it
recreates shaders a lot. It also depended on how handles are reuses by
driver implementation.
This is fixed by discarded any associated pipeline when the pipeline
layout is discarded.
Pull Request: https://projects.blender.org/blender/blender/pulls/138800
Part of #136993.
Share as much of the ShaderCompiler implementations as possible.
Remove the ShaderCompiler/ShaderCompilerGeneric split and make most of
its functions non virtual.
Move the `get_compiler` function from `Context` to `GPUBackend` and
creation/deletion to `GPUBackend::init/delete_resources`.
Add a `batch_cancel` function to `ShaderCompiler` (needed for the
GPUPass refactor).
As a nice extra, the multithreaded OpenGL compilation has become faster
too.
The barbershop materials + EEVEE static shaders have gone from 27s to
22s.
I have not observed any performance difference on Vulkan or Metal.
Pull Request: https://projects.blender.org/blender/blender/pulls/136676
These functions are trivial and shouldn't add the cost of a call.
They appeared in profiles, which they shouldn't since they mostly
just return access to member variables. Inlining them reduces
the backend's overhead when sculpting.
Also reserve a Vector before repeated appending.
Pull Request: https://projects.blender.org/blender/blender/pulls/138349
This avoid having to guards functions that are
only available in fragment shader stage.
Calling the function inside another stage is still
invalid and will yield a compile error on Metal.
The vulkan and opengl glsl patch need to be modified
per stage to allow the fragment specific function
to be defined.
This is not yet widely used, but a good example is
the change in `film_display_depth_amend`.
Rel #137261
Pull Request: https://projects.blender.org/blender/blender/pulls/138280
Implemented the VKSamplers based on the documentation in GPU_texture.hh.
They used to be reversed engineered from the OpenGL backend, however
Vulkan has better control over mipmap filtering and OpenGL needed to
combine some flexibility inside the min/mag filtering.
Changes:
- Mipmap filtering was always on, not only when `GPU_SAMPLER_FILTERING_MIPMAP`
is used.
- Mipmap filter mode is always set to linear. This is not according to the
documentation, but matches OpenGL and the render tests.
Fixes#136439, #138111
Partial fix#137436 - seam isn't there, inf can still be part of the
texture.
Pull Request: https://projects.blender.org/blender/blender/pulls/138313
Update descriptor pool sizes to be better for smaller scenes. Creating,
resetting and destroying pools are expensive operations. In smaller
scenes this can lead to lower performance.
The improvements may differ per platforms or not be visible.
Pull Request: https://projects.blender.org/blender/blender/pulls/138300
The use-case of this blend mode is to be able to make parts of an
viewport overlay transparent.
The future user of this blend mode is sequencer preview drawing
where frame will be drawn to an HDR render frame-buffer, and overlays
drawn on-top. In a way it is similar to the image engine, but without
need to have custom shader.
Ref #138094
Pull Request: https://projects.blender.org/blender/blender/pulls/138307
In a profile of sculpting with the Vulkan GPU backend enabled,
This function made up 0.7% of samples. Since it's just a single
comparison, inlining it should be helpful for the compiler.
Pull Request: https://projects.blender.org/blender/blender/pulls/138210
This is completely unused, not implemented for the Vulkan backend, and
seems to add quite a bit of complexity to the Metal and OpenGL backends.
It was added for EEVEE legacy motion blur, and the last use was removed
along with EEVEE legacy. We're probably better off not maintaining it since
we should avoid duplicating vertex buffer data anyway.
Pull Request: https://projects.blender.org/blender/blender/pulls/138226
Each time when rendering an image a new context is created. When the
context is destroyed it can still contain a render graph, which ownership
isn't transferred back to the device. Resulting in an increase of several
MB per render. The render graph is cleanly destroyed during quiting as
there is a master list of created render graphs.
Fixed by moving the ownership of the render graph back to the device
when a context is unregistered.
Pull Request: https://projects.blender.org/blender/blender/pulls/138095
When allocating a large vertex buffer on NVIDIA it tried to allocate it
on the GPU and host visible. This section is limited in size (256MB).
However when allocating a large vertex buffer it should not have been
chosen.
Detected during investigation of #137909.
It removes the validation error, but it is unclear yet if this solves the'
crash as I wasn't able to reproduce the crash.
Pull Request: https://projects.blender.org/blender/blender/pulls/138079
* Pixel buffer is always allocated with export and dedicated memory flags.
* Returns an opaque file descriptor (Unix) or handle (Windows).
* Native handle now includes memory size as it may be slightly bigger
than the requested size.
Pull Request: https://projects.blender.org/blender/blender/pulls/137363
Previously spell checker ignored text in single quotes however this
meant incorrect spelling was ignored in text where it shouldn't have
been.
In cases single quotes were used for literal strings
(such as variables, code & compiler flags),
replace these with back-ticks.
In cases they were used for UI labels,
replace these with double quotes.
In cases they were used to reference symbols,
replace them with doxygens symbol link syntax (leading hash).
Apply some spelling corrections & tweaks (for check_spelling_* targets).
For performance reasons render graphs can keep memory allocated so it
could be reused. This PR optimizes the memory usage inside the
rendergraph to keep it within normal usage.
I didn't detect any performance regression with this change but reduces
the memory when performing final image rendering of heavy scenes.
Partial fix for #137382. the amount of memory still increases with 4mb
per render. It fixes the main difference when using large scenes.
Pull Request: https://projects.blender.org/blender/blender/pulls/137660
Descriptor pools were never discarded, leading to out of memory issues
when running for a long time. This PR discards used descriptor pool when
the render graph is submitted to the device.
- Detected that to descriptor sets could be uploaded multiple times
however once was always empty.
- When render graph is flushed all descriptor pools are discarded.
- Improved debugging of discard pools.
Pull Request: https://projects.blender.org/blender/blender/pulls/137521
A couple of memory leak fixes for the vulkan backend.
We increment the submission_id on render_graphs upon reset. This
triggers cleanup of anything tracked as a VKResourceTracker. Notably
uniform buffers created for push constant fallbacks. This fixes a memory
leak that was accumulating VKUniformBuffers every frame without cleaning
them up.
Reset resource pools when a swapchain image is presented. This ends up
calling vkResetDescriptorPool, freeing up descriptor set resources. This
fixes a memory leak that was accumulate descriptor sets and pools over
time without freeing them.
Pull Request: https://projects.blender.org/blender/blender/pulls/137305
Incorrect data format was selected when using CPU data transfers in
OpenXR. It always used `GPU_DATA_HALF_FLOAT`, also when the swapchains
where `GPU_RGBA8`. This resulted in black screens in release mode, and
asserts in debud mode.
Fixed by selecting the correct data transfer data type based on the
swapchain format.
Co-authored-by: jeroen@blender.org <Jeroen Bakker>
Pull Request: https://projects.blender.org/blender/blender/pulls/137269
This PR add support to use a win32 handle to perform share render
result with the OpenXR vulkan instance. This is only possible when
the GPU matches. Otherwise a CPU roundtrip will be performed.
Pull Request: https://projects.blender.org/blender/blender/pulls/137093
Includes win32 specific extensions definitions when including
`vk_common.hh`. Inside `gpu_context.cc` vulkan needs to be
included before opengl, otherwise windows 10 builders will
report a warning.
```
[6421/7520] Building CXX object source\blender\gpu\CMakeFiles\bf_gpu.dir\intern\gpu_context.cc.obj
C:\Program Files (x86)\Windows Kits\10\include\10.0.19041.0\shared\minwindef.h(130): warning C4005: 'APIENTRY': macro redefinition
C:\Users\blender\git\blender-vexp\blender.git\lib\windows_x64\epoxy\include\epoxy/gl.h(59): note: see previous definition of 'APIENTRY'
```
Pull Request: https://projects.blender.org/blender/blender/pulls/137134
This PR implements dynamic viewport state for the Vulkan gpu backend.
By doing so, it fixes#130914.
The following high-level changes were made:
1. The pipeline pool no longer uses the viewport and scissor
states to identify graphics pipelines, only the number of viewports
and the number of scissors. Graphics pipelines are configured with
dynamic viewport and scissor states upon construction.
2. The desired viewport and scissor configurations for drawing are set
in the data of the draw nodes in the render graph.
3. The draw nodes use these viewport and scissors settings in
`build_commands`. If the viewport and scissor settings have changed
between nodes, then vkCmdSetViewport and vkCmdSetScissor commands
are sent to the command buffer.
4. Tests are updated to verify that set_viewport and set_scissor commands
are executed the correct number of times. (Also note that I needed to
#136987 in order to avoid skipping some Vulkan tests).
See the attached screencast for verification. The number of graphics pipelines
no longer grow when resizing the viewport.
Pull Request: https://projects.blender.org/blender/blender/pulls/137002
Update the `ShaderCompilerGeneric` to support deferred compilation
using the batch compilation API, so we can get rid of
`drw_manager_shader`.
This approach also allows supporting non-blocking compilation
for static shaders.
This shouldn't cause any behavior changes at the moment, since batch
compilation is not yet used when parallel compilation is disabled.
This adds a `GPUWorker` and a `GPUSecondaryContext` as an easy to use
wrapper for managing secondary GPU contexts.
(Part of #133674)
Pull Request: https://projects.blender.org/blender/blender/pulls/136518
Current implementation uses a CPU roundtrip to transfer render result
to the Xr Swapchain. This PR adds support for sharing the render result
on Linux systems by using file descriptors.
To extend this solution to win32 or dx handles can be done by extending
the data transfer modes, register the correct extensions. When not
using the same GPU between Blender and OpenXR the CPU roundtrip
will still be used.
Solution has been validated with monado simulator and seems to be as
fast as OpenGL.
Performance can be improved by using GPU based synchronization.
Current API is limited as we cannot chain the different renders and
swapchains.
Pull Request: https://projects.blender.org/blender/blender/pulls/136933
This is getting in the way of making the
GPUShader API more threadsafe.
This getter already doesn't work for vulkan
and Metal, and has very limited usage.
Keeping the python function to avoid errors
and display a deprecation warning.
Pull Request: https://projects.blender.org/blender/blender/pulls/136983
This patch supports GPU OIDN denoising in the compositor. A new
compositor performance option was added to allow choosing between CPU,
GPU, and Auto device selection. Auto will use whatever the compositor is
using for execution.
The code is two folds, first, denoising code was adapted to use buffers
as opposed to passing in pointers to filters directly, this is needed to
support GPU devices. Second, device creation is now a bit more involved,
it tries to choose the device is being used by the compositor for
execution.
Matching GPU devices is done by choosing the OIDN device that matches
the UUID or LUID of the active GPU platform. We need both UUID and LUID
because not all platforms support both. UUID is supported on all
platforms except MacOS Metal, while LUID is only supported on Window and
MacOS metal.
If there is no active GPU device or matching is unsuccessful, we let
OIDN choose the best device, which is typically the fastest.
To support this case, UUID and LUID identifiers were added to the
GPUPlatformGlobal and are initialized by the GPU backend if supported.
OpenGL now requires GL_EXT_memory_object and GL_EXT_memory_object_win32
to support this use case, but it should function without it.
Pull Request: https://projects.blender.org/blender/blender/pulls/136660
This allow to store the full object ID inside a `uint32`
buffer. This allows to get the per object data in deferred
passes and avoid to store object data inside the Gbuffer.
This data is only written if needed.
This had to modify the implementation of subpass input
for all backend to be able to bind layered texture.
This currently work because only the layer 0 is bound to the
framebuffer. This is fragile but I don't see a good builtin way
to fix it.
Rel #135935
#### Tasks
- [x] Replace light linking bits in Gbuffer
- [x] Replace Object ID in GBuffer for SSS
- [x] Conditional storage
- [x] Dummy storage if not needed
Pull Request: https://projects.blender.org/blender/blender/pulls/136428
This PR refactors the way how swapchains are used.
Allow scaling of the swapchain content to the actual resolution of the swapchain.
can reduce artefacts when resizing windows when supported.
When frame rate is to fast the previous implementation could use a semaphore
that were still in use, leading to unwanted stuttering on certain platforms. Waiting
when the rendering has finished (GHOST_Frame.submission_fence), before the
next image is acquired from the swap chain.
Mailbox has been disabled as it can calculate more frames then actually been
presented, leading to a lag and increased power usage on others.
Pull Request: https://projects.blender.org/blender/blender/pulls/136603
When making a minimized window larger Blender can have negative regions.
This leads to out of bound writes when blitting to the framebuffer.
Easy reproducable on NVIDIA/Windows.
Pull Request: https://projects.blender.org/blender/blender/pulls/136832
After reviewing the locations where `GPU_flush()` are used it doesn't seem
to be harmfull to include these for the Vulkan backend as well. Hopefully
will save some lag that can happen when submitting one huge render graph.
Improved playback of rain_restaurant.blend where frames could be dropped
resulting into UI lag.
Pull Request: https://projects.blender.org/blender/blender/pulls/136654