The issue was that the shader `gpu_shader_gpencil_stroke_vert_no_geom.glsl`
assumed a wrong format of the color attribute (`uchar4` instead of `float4`).
The fix uses `vertex_fetch_attribute` with `float4`.
Pull Request: https://projects.blender.org/blender/blender/pulls/129072
Intel Windows drivers for 10th gen and lower has some strange behavior
when using dynamic rendering. It requires pipeline conditions to be met,
when beginning a new rendering scope. This is strange as specs + VVL
notes that these conditions should be met during vkCmdDraw* commands.
Pull Request: https://projects.blender.org/blender/blender/pulls/129055
This change makes unused attachments extension optional.
This extension is fairly new and not all drivers have support for it.
The workaround will create additional pipelines when attachments are
not set.
Pull Request: https://projects.blender.org/blender/blender/pulls/129046
VK_EXT_dynamic_rendering_unused_attachments is required for correct working.
Renderdoc hides this extension, but most platforms do work. However the
Windows Intel driver crashes when using iGPUs; they don't support this
extension at all.
This change does a more strict extension test so drivers that do not
support this extension will fallback to OpenGL. When using renderdoc it
is now allowed to compile blender with `WITH_RENDERDOC=On`.
Future developments are needed to add support for Intel iGPUs on
Windows.
Pull Request: https://projects.blender.org/blender/blender/pulls/128986
Resources are shared, when running multiple contexts on the same thread.
Cycles uses the same context on multiple threads and expected same resources.
This change will introduce a single render graph per context and an updated
resource management. Render graphs are not shared anymore; Resource pools
are still shared, but garbage collection depends on the thread and if
background rendering is used.
Pull Request: https://projects.blender.org/blender/blender/pulls/128983
Cycles uses multiple threads to send commands to the GPU. The current
command buffer structure assumed that all commands from the same context
were send via the same thread. This wasn't the case and could lead to
recording commands to command buffers that are still pending (preparing
commands to send to GPU).
This is fixed by creating a command buffer each time a render graph
submits its work.
Detected when researching #128608
Pull Request: https://projects.blender.org/blender/blender/pulls/128978
When allocating a host visible buffer it could be that the returned
buffer was not host visible and access to the buffer would write to
unallocated memory.
Detected when researching #128608
Pull Request: https://projects.blender.org/blender/blender/pulls/128977
When using cycles in the viewport there it uses render threads and
workers to update the viewport. All threads can record commands to the
queue and needs to be synchronized. This didn't happen leading to
incorrect renders.
Detected when researching #128608
Pull Request: https://projects.blender.org/blender/blender/pulls/128975
Multiple threads can access the same device queue from different
threads. This could happen when doing a cycles preview render, baking
eevee volume probes or generating material previews.
This PR adds a mutex around access to the device queues.
Detected when researching #128608
Pull Request: https://projects.blender.org/blender/blender/pulls/128974
When using vertex displacement EEVEE didn't compile the generated
functions into the vertex shader. This could result in errors when
compiling materials.
**Notes**
- Should be back-ported to Blender 4.2
Pull Request: https://projects.blender.org/blender/blender/pulls/128525
Adds an additional precheck to identify if the app cache dir is correct.
Reduces placing cache files all over the place when the app dir isn't
correct.
Adds a SPIR-V cache that skips frontend compilation for shaders
that are already compiled in a previous run of Blender.
Initially this was postponed to 4.4 but it was observed that
the vulkan backend didn't perform well on Windows in debug
builds. The reason is that the compiler would also be a debug
build which makes compiling a shader really slow. Starting
Blender on a debug build could take minutes.
So the decision was made to give this task a higher priority so
the vulkan backend would become more usable to developers
as well.
The cache is stored in the application cache dir. The SPIR-V
binaries can be used by different Blender versions so there
is no version specific cache folder.
**Sidecar**: SPIR-V files are a stream of bytes. There is no
header information that allow us to validate the stream. To
add basic validations we could add our custom header or
a sidecar. It was chosen to use a sidecar as having the SPIR-V
files unmodified allows us to load them directly in
debug tools for analyzing.
**Retention**: Shaders that are not used are automatically
removed with a retention period of 30 days.
**Shader builder**: Shader builder cannot use the SPIR-V
cache as it uses stubs that returns invalid cache directories.
This would load/save the cache to the location where you
started the build.
Pull Request: https://projects.blender.org/blender/blender/pulls/128741
Between 0bfd5e3536
and b1cbd9c889
the main branch is incorrectly processing the file
`draw_debug_info.hh` as GLSL and does some string
preprocessing on it. But the output filename matches
the name of the header source file used for compiling
the gpu module. This file not having been updated
since a long time doesn't get copied from the source
folder when switching to other branch and make compilation
fail.
In order to avoid breaking the buildbot longer, we
rename the incriminating file to force recreate it
when building the release branch.
When attaching a layered image with offset, the size of the attached
layers should be decreased. Otherwise an image view is created that can
access incorrect data.
Pull Request: https://projects.blender.org/blender/blender/pulls/128583
Related to #126974, which removed command reordering due to some
EEVEE/framebuffer requirements. However buffer can still be reordered
without any artifacts.
Update buffers are common operations and are often isolated; safe to
move them outside the rendering scope.
Pull Request: https://projects.blender.org/blender/blender/pulls/128538
This adds feature parity with Cycles regarding light and shadow liking.
Technically, this extends the GBuffer header to 32 bits, and uses
the top bits to store the object's light set membership index.
The same index is also added to `ObjectInfo` in place of padding bytes.
For shadow linking, the shadow blocker sets bitmask is stored per
tilemap. It is then used during the GPU culling phase to cull objects
that do not belong to the shadow's sets.
Co-authored-by: Clément Foucault <foucault.clem@gmail.com>
Pull Request: https://projects.blender.org/blender/blender/pulls/127514
The metal backend assumes that textures can always be allocated. When
metal later detects that the texture cannot be 'baked' it leads to
undesired behavior as Blender think it has a texture with memory
allocated.
This PR only changes the GPU_TEXTURE_3D case as that leads to issues
when loading large voluetric data. Arrayed textures will still fail
as that requires different checks to be added. I rather re-view the
current implementation in the future.
> NOTE: The max depth is hardcoded to 2048
Change should be backported to 4.2
Pull Request: https://projects.blender.org/blender/blender/pulls/128365
When using AMD pro-drivers the limits reported of the device can be
`UINT_MAX` but are stored in int fields. In this case the limits would
become negative and GPU materials validation failed resulting into errors.
This is fixed by clamping the value to `INT_MAX`.
Pull Request: https://projects.blender.org/blender/blender/pulls/128437
Previously, Cycles only supported the Henyey-Greenstein phase function for volume scattering.
While HG is flexible and works for a wide range of effects, sometimes a more physically accurate
phase function may be needed for realism.
Therefore, this adds three new phase functions to the code:
Rayleigh: For particles with a size below the wavelength of light, mostly athmospheric scattering.
Fournier-Forand: For realistic underwater scattering.
Draine: Fairly specific on its own (mostly for interstellar dust), but useful for the next entry.
Mie: Approximates Mie scattering in water droplets using a mix of Draine and HG phase functions.
These phase functions can be combined using Mix nodes as usual.
Co-authored-by: Lukas Stockner <lukas@lukasstockner.de>
Pull Request: https://projects.blender.org/blender/blender/pulls/123532
We can have deferred and non-deferred shaders (so, different threads)
with the same `additonal_info` dependencies trying to finalize the same
`ShaderCreateInfo`.
This ensures `finalize` always runs from the main thread to avoid race
conditions.
Pull Request: https://projects.blender.org/blender/blender/pulls/128281
This change adds the option to update a buffer via the render
graph via `vkCmdUpdateBuffer`. This is only enabled for
uniform buffers as they are small and aligned/sized correctly.
Pull Request: https://projects.blender.org/blender/blender/pulls/128416
This PR introduces support for the extension `VK_KHR_fragment_shader_barycentric`,
and includes a few miscellaneous improvements related to it.
1. Add support for `VK_KHR_fragment_shader_barycentric`, if the physical device
supports it. Otherwise, gpu_BaryCoord is generated through an injected geom
shader, like it was previously.
2. Simplify the logic of checking has_geometry_stage in vert shader.
3. Fix a potential issue of location mismatch in an injected geom shader.
Related to #127687Resolves#126228
Pull Request: https://projects.blender.org/blender/blender/pulls/127995
When performing preview job rendering the memory wasn't recycled leading
to a memory leak. For background rendering we already recycled memory in
a correct way. This change enables the same branch during preview
rendering.
Also adds a better `VKDevice::debug_print` to see the resources being
tracked by the different threads and resource pools.
Pull Request: https://projects.blender.org/blender/blender/pulls/128377
Pipeline pool could log to much information that confused developers who
are not up to date what pipelines are. This PR will hide the confusing
messages. When working on Vulkan these messages can still be shown by
raising the log level.
See !128254
Pull Request: https://projects.blender.org/blender/blender/pulls/128352
This rare GPU has z-fighting issues in editor mode. Might be fixable by
changing the bias, but would decrease precision on other platforms as
well. Better to move this GPU to limited support. It is working, just
has some drawing artifacts.
See #128179
Pull Request: https://projects.blender.org/blender/blender/pulls/128351
When exiting the immediate buffers are discarded, but where not
destroyed making the buffers still leak.
Detected when looking into descriptor set freeze issue.
Pull Request: https://projects.blender.org/blender/blender/pulls/128249
A driver (package) installed by the user can have many different drivers
and they can all report a different version. For AMD the version we
reported was from their Vulkan driver. This version isn't useful during
bug triaging.
This PR will use the driver info and driver name from the driver
properties to construct a driver version string that will be used for
reporting.
Pull Request: https://projects.blender.org/blender/blender/pulls/128232
Immediate mode uses the old 'resource tracker' which has been replaced
by swap chain resource pools. This PR optimizes immediate mode buffers
by utilizing resource pools.
Pull Request: https://projects.blender.org/blender/blender/pulls/128188
Currently the log only contained the first compatible device. It is
more important to the user to know which device is used.
This PR increases the level of the first compatible device so it is only
visible when increasing the log level. It reports the device, driver and
vendor when starting blender with `--debug-gpu`.
Pull Request: https://projects.blender.org/blender/blender/pulls/128168
Resource binding was over-complicated as I didn't understood the state
manager and vulkan to make the correct decisions at that time. This
refactor will remove a lot of the complexity and improves the performance.
**Performance**
The performance improvement is noticeable in complex grease pencil
scenes.
Grease pencil benchmark file picknick:
- `NVIDIA Quadro RTX 6000` 17 fps -> 24 fps
- `Intel(R) Arc(tm) A750 Graphics (DG2)` 6 -> 21 fps
**Bottle-neck**
The performance improvements originates from moving the update entry
point from state manager to shader interface. The previous implementation
(state manager) had to loop over all the bound resources and find in the
shader interface where it was located in the descriptor set. Ignoring
resources that were not used by the shader. But also making it hard
to determine if descriptor sets actually changed. Previous implementation
assumed descriptor sets always changed.
When descriptor set changed a new descriptor set needed to be allocated.
Most drivers this is a fast operation, but on Intel/Mesa this was measurable
slow. Using an allocation pool doesn't fit the Vulkan API as you are only
able to reuse when the layout matches exactly. Of course doable, but requires
another structure to keep track of the actual layouts.
**Solution**
By using the shader interface as entry point we can:
1. Keep track if there are any changes in the state manager. If not and the
layout is the same, the previous shader can be reused.
2. In stead of looping over each bound resource, we loop over bind points.
**Future extensions**
Bundle all descriptor set uploads just before use. This would be more
in line with how 'modern' Vulkan should be implemented. This PR already
separates the uploading from the updating and technically allows to upload
more than one descriptor set.
Instead of looking 1 set back we should measure if we can handle multiple
or keep track of the different layouts resources to improve the performance
even further.
Optional use `VK_KHR_descriptor_buffer` when available.
Pull Request: https://projects.blender.org/blender/blender/pulls/128068
Takes into account any offset that must be added to the vertex index
(usually supplied as baseVertex or startVertex in the Metal draw call)
in the code that emulates the SSBO vertex fetch.
Authored by Apple: James McCarthy
Pull Request: https://projects.blender.org/blender/blender/pulls/127864
Adds antialiasing to curve's handles and thickness to active ones.
Also handles now react to
`Preferences > Interface > Display > Resolution Scale` and
`Preferences > Themes > 3D Viewport > Edge Width` as they do in
legacy curves.
Pull Request: https://projects.blender.org/blender/blender/pulls/122910
GPU resources created during Light probe bake job were added to discard pool, but the pool itself was never notified by worker thread to release resources.
Bake job creates dedicated `GPUContext` for its needs and later deletes it within the same thread.
Pull Request: https://projects.blender.org/blender/blender/pulls/127977
Removes two levels of indirection when updating descriptor sets.
These are the easy ones to remove. Others will be removed in
a future PR.
This is part of reworking of how descriptor sets are used.
This PR Mostly reduces complexity.
Pull Request: https://projects.blender.org/blender/blender/pulls/127915
De-interleaved vertex buffers offsets the attribute in the buffer to
the de-interleaved position. The vertex attribute offset is limited by a
constrained and would raise an error when the buffers just a bit larger.
*VUID-VkVertexInputAttributeDescription-offset-00622*: offset must
be less than or equal to `VkPhysicalDeviceLimits::maxVertexInputAttributeOffset`
This PR fixes this by offsetting the buffer in stead of the attribute.
Offsetting buffers is limited by the amount of memory.
Pull Request: https://projects.blender.org/blender/blender/pulls/128031
Allows users to override the auto detection for GPU
selection. Normally the GPU selection is done by looping
over the order Vulkan provides and finding the highest
performing device based on its type (discrete, integrated,
software).
However users might have multiple discrete cards and want
to switch between them. Or developers want to validate other
GPUs without rebooting.
This PR adds the ability to override the auto detection
for the vulkan backend.

**Future improvements**:
- This PR does not include a command line option. This can be added
later for render farms.
Pull Request: https://projects.blender.org/blender/blender/pulls/127860