Depth navigation sends many small render graphs to the device. It can be
that a subsequent render graph uses the same shader as the previous one
with the same descriptor set tracker. The descriptor set tracker didn't
cleared its full state and a subsequent render graph was generating
commands assuming that the device was in a certain state.
However it wasn't and a command to bind a descriptor set was skipped
resulting in a device out of bound write. Depending on the platform this
could overwrite any data on the GPU, including shader programs as the
select shader writes to a storage buffer. This clarifies why the issue
resulted in very odd and none consistent behavior.
This PR fixes this by clearing the VKPipelineData and
VKDescriptorTracker.
Pull Request: https://projects.blender.org/blender/blender/pulls/140526
When quiting Blender the timeline doesn't get updated and an assert is
triggered that the order isn't correct. The order isn't that important
anymore as the mechanism has been tested. The assert was useful during
initial development.
This PR removes the assert as it isn't valid in all cases.
In several cases synchronization issues around discarded resources could lead
to crashes. These crashes where more prominent on NVIDIA as they reuse
their handles more often.
This PR requires external synchronization when data is moved from a
discard pool to the main orphaned data discard pool. Also the device
timeline should be guarded by the same mutex.
Pull Request: https://projects.blender.org/blender/blender/pulls/140274
We added descriptor buffers to 4.5 as it contains some core API changes.
However there have been various reports that the system isn't fully
mature resulting in lower performance especially on NVIDIA GPUs.
This PR will disable descriptor buffer feature for NVIDIA GPUs.
Pull Request: https://projects.blender.org/blender/blender/pulls/140263
Previous implementation allows to move VKBuffers, but didn't do a
proper std::move. On second thought it is a bad idea to be able to move
GPU resources. This PR removes the ability to move the buffer and
replace the usages with a unique ptr.
Pull Request: https://projects.blender.org/blender/blender/pulls/140103
This has limited use cases since it doesn't
profile the heavy part of the vulkan backend.
Almost 1:1 port of the metal implementation from #139551.
Doesn't cover rendergraph submission nor GPU timings.
Pull Request: https://projects.blender.org/blender/blender/pulls/139899
Color picking reads 3 values from a texture that is backed by 4
channels. The conversion would also convert the 4th channel.
This is a short term fix. We should reconsider the usage of reading a
different number of elements than backed by the texture. But that
requires work in the color picking and python GPU module.
Pull Request: https://projects.blender.org/blender/blender/pulls/139929
Descriptor sets/pools are known to be troublesome as it doesn't match
how GPUs work, or how application want to work, adding more complexity
than needed. This results is quite an overhead allocating and
deallocating descriptor sets.
This PR will use descriptor buffers when they are available. Most platforms
support descriptor buffers. When not available descriptor pools/sets
will be used.
Although this is a feature I would like to land it in 4.5 due to the API changes.
This makes it easier to fix issues when 4.5 is released.
The feature can easily be disabled by setting the feature to false if it has
to many problems.
Pull Request: https://projects.blender.org/blender/blender/pulls/138266
When buffers/images are allocated that use larger limits than supported
by the GPU blender would crash. This PR adds some safety mechanism to
allow Blender to recover from allocation errors.
This has been tested on NVIDIA drivers.
Pull Request: https://projects.blender.org/blender/blender/pulls/139876
Compilation constants are constants defined in the create info.
They cannot be changed after the shader is created.
It is a replacement to macros with added type safety.
Reuse most of the logic from Specialization constants.
Pull Request: https://projects.blender.org/blender/blender/pulls/139703
This PR changes loading of implicit vulkan layers. See #139543 where we
detected that there are vulkan layers installed on systems that try to
impersonate other software, but crashes when used in Blender.
This PR will reuse the VkInstance of GHOST_ContextVK when querying for
possible compatible devices. Previously a temporary VkInstance was
created but could trigger an error in the Vulkan loader.
Pull Request: https://projects.blender.org/blender/blender/pulls/139640
Race condition between the main thread and worker thread. In this case
the worker thread was waiting for a device idle, but the main thread was
waiting on a specific timeline. As long as the timeline didn't pass the
device was locked.
Releaved the race condition to check on queue idle as the timeline event
isn't part of the queue.
On windows drivers could be reset and could lead to other crashes as handles
were not used anymore.
Pull Request: https://projects.blender.org/blender/blender/pulls/139579
In this case a triangle shader was used to render points.
This change entails:
- Using point shaders in this case
- Add support for `GPU_point_size` to update the uniform of the point
shader.
Pull Request: https://projects.blender.org/blender/blender/pulls/139574
During rendering when main thread is blocked or all screens are minimized the
garbage collection will not happen resulting in crashes as resources are not freed.
A better solution would be to do garbage collection in a separate thread but that requires a
ref counting system. The specifics of such a system is still unclear.
A possible solution for a Vulkan specific ref counting is to store the ref counts in the
resource tracker. That would only handle images and buffers, but it would solve the most
resource hungry issues.
Pull Request: https://projects.blender.org/blender/blender/pulls/139475
On selected platforms there were some validation errors. It was caused by
platforms that returned a different number of swapchain images then were
requested. In that case the semaphores can get out of sync.
Current mechanism isn't future proof as the max number of images are
statically defined.
For this change the present semaphores is also separated from the frames
to better support out of order swapchain images.
Pull Request: https://projects.blender.org/blender/blender/pulls/139446
GHOST backend didn't use logging. This PR adds an initial ghost.vulkan
logging and improves the reporting of logging in vulkan.
logging can be enabled by `blender --log "gpu.vulkan,ghost.vulkan" --log-level 2`
it shows the optional extensions that are enabled and information about swap chain
events.
Pull Request: https://projects.blender.org/blender/blender/pulls/139437
Unlike OpenGL and Metal, this handle is not shared, but rather Cycles
has to take ownership of it. This required a fair amount of refactoring
to ensure the handle is closed, ownership is properly transferred, and
the handle is recreated once when the pixel buffer is modified.
Usage of conditional to fix threading and performance delays when
waiting for the submission fence to become valid. The previous
(faulty) implementation didn't work well on WoA devices.
Qualcomm devices don't support external memory, but there an external
memory pool was still being constructed. This PR skips the creation and
asserts when using external memory on those devices.
Pull Request: https://projects.blender.org/blender/blender/pulls/139326
The stencil value was also considered and could lead to out of range
depth values. These were ignored by operators and could lead to
printing errors, canceling operators, or inaccurate depth
selection.
This is a NVIDIA only issue as these GPU support DEPTH24S8 textures.
We should consider defaulting to DEPTH32FS8.
Pull Request: https://projects.blender.org/blender/blender/pulls/139272
This allows multiple threads to request different specializations without
locking usage of all specialized shaders program when a new specialization
is being compiled.
The specialization constants are bundled in a structure that is being
passed to the `Shader::bind()` method. The structure is owned by the
calling thread and only used by the `Shader::bind()`.
Only querying for the specialized shader (Map lookup) is locking the shader
usage.
The variant compilation is now also locking and ensured that
multiple thread trying to compile the same variant will never result
in race condition.
Note that this removes the `is_dirty` optimization. This can be added
back if this becomes a bottleneck in the future. Otherwise, the
performance impact is not noticeable.
Pull Request: https://projects.blender.org/blender/blender/pulls/136991
This feature greatly increase depth buffer precision.
This is very noticeable in large view distance scenes.
This is enabled by default on GPUs that supports it (most of the hardware we
support already supports this). This makes rendering different on the GPUs
that do not support that feature (`glClipControl`).
While this give much better depth precision as before, we also have a lot of
imprecision caused by our vertex transformations. This can be improved in
another task.
Pull Request: https://projects.blender.org/blender/blender/pulls/138898
When using float2/int2/uint2 arrays the elements could be incorrectly
alligned. This wasn't noticable when using Blender as it isn't used,
However python addons and forks can use it.
Fixes an issue with UPBGE
Pull Request: https://projects.blender.org/blender/blender/pulls/139082
Importing memory is done to often. when memory doens't change the
previous imported memory can be used.
The idea is to keep track of the last used buffer and keep reusing
it until the view/resolution has changed. This should not happen during
a session.
Pull Request: https://projects.blender.org/blender/blender/pulls/138984
This commit finishes removing the uses of the integer to float
vertex buffer fetch mode. Previous commits noted below already started
that process. The last usage was geometry attributes. Now integers are
converted to floats as part of the existing upload process.
The change makes the Vulkan vertex buffer type conversion unused, so
it's removed. That's nice because Vulkan vertex buffers go from 1040 to
568 bytes in size and have significantly less overhead on creation.
Related:
- 153abc372e
- 1e1ac2bb9b
- 617858e453
Pull Request: https://projects.blender.org/blender/blender/pulls/138873
- Reduce artifacts during resizing to also recreate the swapchain
when acquire image is suboptimal
- Do not stretch when backbuffer and swapchain have a different size
Pull Request: https://projects.blender.org/blender/blender/pulls/138925
Always discard the context discard pile even when not submitted
When rendering animation it is not guaranteed that the submission
flag will ever be set.
These tests were hitting an assert about
invalid format. They were testing for this
legacy format we don't support anymore.
# Conflicts:
# source/blender/gpu/vulkan/tests/vk_data_conversion_test.cc
Caused by 617858e453.
These formats should use types aligned to 4 bytes. That's generally
required by modern GPUs. Uploading with these types also avoids
automatic conversion by the Vulkan backend which is something
we're hoping to remove fully.
In the end this PR removes a bunch of code related to supporting
the older single-byte formats.
Pull Request: https://projects.blender.org/blender/blender/pulls/138836
Fixes a few render synchronization soundness issues I discovered when investigating
why resetting the pools after the submission fence in #137305 ended up causing #137395.
1. Ensure the number of submission fences in GHOST_ContextVK is the same
as the number of resource pools in VKThreadData. Otherwise, the
fences might get misaligned, making it difficult to predict which
submission a fence's signal corresponds to.
2. In swapBuffers, pass the current m_frame_data's fence and semaphores
to the Vulkan backend callbacks, rather than the following
m_frame_data. This fixes an observed soundness issue where the fences
were off-by-one. Now, the backend callbacks can be sure that both the
next frame is ready for construction (and it's resources can be
cleaned), and will use the correctly aligned fences and semaphores
during command buffer submission.
3. Do not recreate the m_frame_data's fences during recreateSwapchain.
This would lead to unsoundness immediately following
recreateSwapchain where all the fences are signaled but those frames
might still be in flight.
Pull Request: https://projects.blender.org/blender/blender/pulls/137580