This allows multiple threads to request different specializations without
locking usage of all specialized shaders program when a new specialization
is being compiled.
The specialization constants are bundled in a structure that is being
passed to the `Shader::bind()` method. The structure is owned by the
calling thread and only used by the `Shader::bind()`.
Only querying for the specialized shader (Map lookup) is locking the shader
usage.
The variant compilation is now also locking and ensured that
multiple thread trying to compile the same variant will never result
in race condition.
Note that this removes the `is_dirty` optimization. This can be added
back if this becomes a bottleneck in the future. Otherwise, the
performance impact is not noticeable.
Pull Request: https://projects.blender.org/blender/blender/pulls/136991
This feature greatly increase depth buffer precision.
This is very noticeable in large view distance scenes.
This is enabled by default on GPUs that supports it (most of the hardware we
support already supports this). This makes rendering different on the GPUs
that do not support that feature (`glClipControl`).
While this give much better depth precision as before, we also have a lot of
imprecision caused by our vertex transformations. This can be improved in
another task.
Pull Request: https://projects.blender.org/blender/blender/pulls/138898
This commit finishes removing the uses of the integer to float
vertex buffer fetch mode. Previous commits noted below already started
that process. The last usage was geometry attributes. Now integers are
converted to floats as part of the existing upload process.
The change makes the Vulkan vertex buffer type conversion unused, so
it's removed. That's nice because Vulkan vertex buffers go from 1040 to
568 bytes in size and have significantly less overhead on creation.
Related:
- 153abc372e
- 1e1ac2bb9b
- 617858e453
Pull Request: https://projects.blender.org/blender/blender/pulls/138873
Part of #136993.
Share as much of the ShaderCompiler implementations as possible.
Remove the ShaderCompiler/ShaderCompilerGeneric split and make most of
its functions non virtual.
Move the `get_compiler` function from `Context` to `GPUBackend` and
creation/deletion to `GPUBackend::init/delete_resources`.
Add a `batch_cancel` function to `ShaderCompiler` (needed for the
GPUPass refactor).
As a nice extra, the multithreaded OpenGL compilation has become faster
too.
The barbershop materials + EEVEE static shaders have gone from 27s to
22s.
I have not observed any performance difference on Vulkan or Metal.
Pull Request: https://projects.blender.org/blender/blender/pulls/136676
The use-case of this blend mode is to be able to make parts of an
viewport overlay transparent.
The future user of this blend mode is sequencer preview drawing
where frame will be drawn to an HDR render frame-buffer, and overlays
drawn on-top. In a way it is similar to the image engine, but without
need to have custom shader.
Ref #138094
Pull Request: https://projects.blender.org/blender/blender/pulls/138307
This is completely unused, not implemented for the Vulkan backend, and
seems to add quite a bit of complexity to the Metal and OpenGL backends.
It was added for EEVEE legacy motion blur, and the last use was removed
along with EEVEE legacy. We're probably better off not maintaining it since
we should avoid duplicating vertex buffer data anyway.
Pull Request: https://projects.blender.org/blender/blender/pulls/138226
* Pixel buffer is always allocated with export and dedicated memory flags.
* Returns an opaque file descriptor (Unix) or handle (Windows).
* Native handle now includes memory size as it may be slightly bigger
than the requested size.
Pull Request: https://projects.blender.org/blender/blender/pulls/137363
This makes no sense to have in the draw namespace.
Also take the opportunity for making the coordinates
a float2 and rename them to something more descriptive.
This unify the C++ and GLSL codebase style.
The GLSL types are still in the backend compatibility
layers to support python shaders. However, the C++
shader compilation layer doesn't have them to enforce
correct type usage.
Note that this is going to break pretty much all PRs
in flight that targets shader code.
Rel #137261
Pull Request: https://projects.blender.org/blender/blender/pulls/137369
They are actually already some literals with the `f` suffix
that are in our shader codebase and we never had problem in
the past 5 years (or even 8 years).
So I think it is safe to do and improves convergence of codestyles.
Pull Request: https://projects.blender.org/blender/blender/pulls/137352
Update the `ShaderCompilerGeneric` to support deferred compilation
using the batch compilation API, so we can get rid of
`drw_manager_shader`.
This approach also allows supporting non-blocking compilation
for static shaders.
This shouldn't cause any behavior changes at the moment, since batch
compilation is not yet used when parallel compilation is disabled.
This adds a `GPUWorker` and a `GPUSecondaryContext` as an easy to use
wrapper for managing secondary GPU contexts.
(Part of #133674)
Pull Request: https://projects.blender.org/blender/blender/pulls/136518
This is getting in the way of making the
GPUShader API more threadsafe.
This getter already doesn't work for vulkan
and Metal, and has very limited usage.
Keeping the python function to avoid errors
and display a deprecation warning.
Pull Request: https://projects.blender.org/blender/blender/pulls/136983
This patch supports GPU OIDN denoising in the compositor. A new
compositor performance option was added to allow choosing between CPU,
GPU, and Auto device selection. Auto will use whatever the compositor is
using for execution.
The code is two folds, first, denoising code was adapted to use buffers
as opposed to passing in pointers to filters directly, this is needed to
support GPU devices. Second, device creation is now a bit more involved,
it tries to choose the device is being used by the compositor for
execution.
Matching GPU devices is done by choosing the OIDN device that matches
the UUID or LUID of the active GPU platform. We need both UUID and LUID
because not all platforms support both. UUID is supported on all
platforms except MacOS Metal, while LUID is only supported on Window and
MacOS metal.
If there is no active GPU device or matching is unsuccessful, we let
OIDN choose the best device, which is typically the fastest.
To support this case, UUID and LUID identifiers were added to the
GPUPlatformGlobal and are initialized by the GPU backend if supported.
OpenGL now requires GL_EXT_memory_object and GL_EXT_memory_object_win32
to support this use case, but it should function without it.
Pull Request: https://projects.blender.org/blender/blender/pulls/136660
This allow to store the full object ID inside a `uint32`
buffer. This allows to get the per object data in deferred
passes and avoid to store object data inside the Gbuffer.
This data is only written if needed.
This had to modify the implementation of subpass input
for all backend to be able to bind layered texture.
This currently work because only the layer 0 is bound to the
framebuffer. This is fragile but I don't see a good builtin way
to fix it.
Rel #135935
#### Tasks
- [x] Replace light linking bits in Gbuffer
- [x] Replace Object ID in GBuffer for SSS
- [x] Conditional storage
- [x] Dummy storage if not needed
Pull Request: https://projects.blender.org/blender/blender/pulls/136428
This PR enabled GPU based subdivision on Metal.
Most work is done in #135296.
- Metal max storage bindings for compute shaders were never set.
Some performance figures: Suzanne 6 subdivision levels
| Machine | CPU Subdivision | GPU Subdivision |
| --------------- | --------------- | --------------- |
| M1 Studio Ultra | 7fps | 12 fps |
| M2 Air | 3fps | 11 fps |
Pull Request: https://projects.blender.org/blender/blender/pulls/135628
Subdivision shaders currently fail to compile using Metal as it doesn't recognize
packed_float3 as an internal data type. This PR includes packed_float3 as an
internal data type.
Without this `blender --debug-gpu-compile-shaders` will fail as it includes a namespace.
```
ERROR (gpu.shader): subdiv_normals_accumulate Compute Shader:
|
| source/blender/gpu/metal/mtl_shader_generator.mm:971:9: Error: no type named 'packed_float3' in 'MTLShaderComputeImpl'; did you mean simply 'packed_float3'?
|
| device MTLShaderComputeImpl::packed_float3* normals[[buffer(MTL_storage_buffer_base_index+4)]],
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
| packed_float3
|
| /System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/32023/Libraries/lib/clang/32023.196/include/metal/metal_packed_vector:145:58: Note: 'packed_float3' declared here
|
| typedef __attribute__((__packed_vector_type__(3))) float packed_float3;
| ^
```
Pull Request: https://projects.blender.org/blender/blender/pulls/134925
There seems to be a pattern where this commonly failed.
This patch adds the async flush (which is effectively not async)
when there were no previous call to `async_flush_to_host`.
This is only done on Intel Macs (or any mac that has non
unified memory arch).
Pull Request: https://projects.blender.org/blender/blender/pulls/134216
Native tile input wasn't part of the MTLCapability struct, but stored locally
in the shader generator and checked in MTLFramebuffer. This PR moves it
to the MTLCapability struct and disables it when workarounds are forced.
Pull Request: https://projects.blender.org/blender/blender/pulls/133818
This allows to run with the --debug-gpu option (which
does NAN and 0xF0F0F0F0 clearing) without asserts
even when the texture atomic workaround is enabled.
This was caused by the subpass input workaround for non-tilebased
GPU using `texelFetch` on an `image`. This was supported before
the cleanup 9c0321ae9b.
But is against the GLSL specification and was removed inside the
cleanup.
Using `imageLoad` instead of `texelFetch` fixes the crash.
However rendering seems to be broken for other reasons.
Recently it came to out attention that macOs13 doesn't always work due
to texture atomics not supported by that version of the OS.
Development happens most of the time on newer versions of the OS without
ability to check if it still works on the older versions.
This PR enables to disable some Metal capabilities to better check how
Blender works on those OS's. The capabilities that will be disabled
are texture gathering and texture atomics. It doesn't disable the
capabilities that are required to start Blender, which are still
part of the `MTLCapabilities` struct.
This allows us to reproduce issues like #129571
Pull Request: https://projects.blender.org/blender/blender/pulls/133636
Framebuffers are getting freed in the GPUContext base class destructor. But
the framebuffer destructors use the MTL/VK/GLContext derived class, whose
destructor has already completed at this point. So these contexts are no
longer valid to use.
Now free the framebuffers earlier.
This caused ASAN warnings, it's not known to cause actual bugs.
Pull Request: https://projects.blender.org/blender/blender/pulls/132504
Rendering animations from Python scripts via `bpy.ops.render.opengl()`
did not trigger any of the notifications in the Metal back-end to
indicate a frame had been rendered and that the associated resources
could be released. This adds a call to GPU_render_step() after each
render. For the original asset in the bug report this reduces the high
memory watermark from 30gb to 13gb for 500 frames. 13gb is likely
still too high and therefore it is likely there are additional leaks
that need to be addressed so this should only be considered a partial
fix.
Authored by Apple: James McCarthy
Co-authored-by: James McCarthy <jamesmccarthy@apple.com>
Co-authored-by: Clément Foucault <foucault.clem@gmail.com>
Pull Request: https://projects.blender.org/blender/blender/pulls/131085
GPUViewport is creating a bunch of framebuffer textures for itself, but
some space types never initialize/use them. E.g. Sequencer, Nodes etc.
only ever use the "overlay" texture. Eventually when viewport is
"drawn", it combines this uninitialized texture data and then only by
luck it happens that most of the time it is black. But not always!
The textures were only cleared (right now) on Metal backend, under
GPU_clear_viewport_workaround as if it was some driver workaround. Stop
doing that, and just clear them always.
However, there was seemingly a performance issue on OpenGL, when this
clear was being done. At least on my machine (Win10, Geforce RTX
3080Ti), the overhead of doing the clears is measurable, and is caused
by usage of GL4.4 glClearTexImage instead of a framebuffer clear. As if
glClearTexImage makes "pixel data to exist" on the CPU side and then
later on binding this framebuffer sends off that data to the GPU, or
somesuch.
More details in the PR.
Pull Request: https://projects.blender.org/blender/blender/pulls/131518
The string `msl_patch_default` can have been read
partially uninitialized or initialized multiple
time and read uncomplete during multithreaded
compilation.
This should fix the GPU tests randomly failing on mac.
While this would never fail when blender runs from the UI (since
UI shaders are init in single threaded manner and always compile
before EEVEE shaders), this race condition could happen when running
EEVEE through background rendering or running tests.
Pull Request: https://projects.blender.org/blender/blender/pulls/131580