Ensure attachment states and load/store configs don't get out of sync
with the framebuffer layout.
In theory, a Framebuffer could have empty attachments interleaved with
valid ones so checking just the attachments "length" is not enough.
What this does instead is to ensure that valid attachments have a valid
config and that null attachments either don't have a matching config or
have an IGNORE/DONT_CARE one.
Pull Request: https://projects.blender.org/blender/blender/pulls/117073
Allows specification of per-shader threadgroup memory tuning
to optimise performance through increase of GPU occupancy.
Authored by Apple: Michael Parkin-White
Pull Request: https://projects.blender.org/blender/blender/pulls/115238
This PR adds support for specialization constants for the OpenGL
backend. The minimum OpenGL version we are targetting doesn't
have native support for specialization constants. We simulate this
by keeping track of shader programs for each set of specialization
constants that are being used.
Specialization constants can be used to reduce shader complexity
and improve performance as less registry and/or spilling is done.
This requires the ability to recompile GLShaders. In order to do this
we need to keep track of the sources that are used when the shader
was compiled. For static sources we only store references
(`GLSource::source_ref`), for dynamically generated sources we keep
a copy of the source (`GLSource::source`).
When recompiling the shader GLSL source-code is generated for the
constants stored in `Shader::constants`. When compiling the previous
GLSource that contains specialization constants is then replaced
by the new version.
Pull Request: https://projects.blender.org/blender/blender/pulls/116926
This avoid the cost of creating the tiles themselves which uses a lot
texture write. This was a bottleneck on Apple GPUs.
Also the per pixel classification allows us to remove certain checks in
the deferred lighting shader making it faster.
### TODO
- [x] Add gl_FragStencilRefARB support on other backend
- [x] Add workaround for when gl_FragStencilRefARB isnt supported
Pull Request: https://projects.blender.org/blender/blender/pulls/116704
Along with the 4.1 libraries upgrade, we are bumping the clang-format
version from 8-12 to 17. This affects quite a few files.
If not already the case, you may consider pointing your IDE to the
clang-format binary bundled with the Blender precompiled libraries.
This adds some `#line` directive between the
source file injection so that the log parser knowns
which file the errors originated from.
This is then followed by a scan over the combined
source to find out the real row number.
This needed some changes in the `Shader::plint_log`
to skip lines to avoid outputing redundant information.
Adds API to allow usage of specialization constants in shaders.
Specialization constants are dynamic runtime constants which can
be compiled into a shader pipeline state object (PSO) to improve
runtime performance by reducing shader complexity through
shader compiler constant-folding.
This API allows specialization constant values to be specified
along with a default value if no constant value has been declared.
Each GPU backend is then responsible for caching PSO permutations
against the current specialization configuration.
This patch adds support for specialization constants in the
Metal backend and provides a generalised high-level solution
which can be adopted by other graphics APIs supporting
this feature.
Authored by Apple: Michael Parkin-White
Authored by Blender: Clément Foucault (files in gpu/test folder)
Pull Request: https://projects.blender.org/blender/blender/pulls/115193
Remove most includes of this header inside other headers, to remove unnecessary
indirect includes which can have a impact on compile times. In the future we may
want more dedicated "_fwd.hh" headers, but until then, this sticks with the
solution in existing code.
Unfortunately it isn't yet possible to remove the include from `BKE_geometry_set.hh`.
Each value is now out of the global namespace, so they can be shorter
and easier to read. Most of this commit just adds the necessary casting
and namespace specification. `enum class` can be forward declared since
it has a specified size. We will make use of that in the next commit.
This patch adds an alternative path for devices/OSs
which do not support native texture atomics in Metal.
Support is encapsulated within the backend, ensuring
any allocated texture with the USAGE_ATOMIC flag is
allocated with a backing buffer, upon which atomic
operations happen.
The shader generation is also changed for the atomic
case, which instructs the backend to insert additional
buffer bind-points for the buffer resource. As Metal
also only supports buffer-backed textures for
textureBuffers or 2D textures, TextureArrays and
3D textures are emulated within a 2D texture, with
sample locations being indirected.
All usage of atomic textures MUST now utilise the
correct atomic texture types in the high level shader
and GPUShaderCreateInfo declarations.
Authored by Apple: Michael Parkin-White
Pull Request: https://projects.blender.org/blender/blender/pulls/115956
This PR introduced some filters to improve the workflow when using
shader_builder. Shader builder is used to validate shader compilation
during buildtime and can be enabled using `WITH_GPU_BUILDTIME_SHADER_BUILDER`.
During backend development shader builder is also handy as you can
pin-point it to the shader/backend you're focusing on. Without filters
you would insert temporary code to break on a specific shader.
* `--gpu-backend` can be used to only check a specific backend.
possible values are `vulkan`, `metal` or `opengl`. When argument
isn't passed, all backends will be validated.
* `--gpu-shader-filter` can be used to only check a subset or indivisual
shader. The filter is a name starts with filter. Use
`--gpu-shader-filter eevee` to validate all eevee shaders
Pull Request: https://projects.blender.org/blender/blender/pulls/115888
NDEBUG is part of the C standard and disables asserts. Only this will
now be used to decide if asserts are enabled.
DEBUG was a Blender specific define, that has now been removed.
_DEBUG is a Visual Studio define for builds in Debug configuration.
Blender defines this for all platforms. This is still used in a few
places in the draw code, and in external libraries Bullet and Mantaflow.
Pull Request: https://projects.blender.org/blender/blender/pulls/115774
Flag enables backends to differentiate between a framebuffer
bind with a custom loadstore state and a standard bind.
For Metal, this resolves an ambiguous complexity about loading
or clearing attachments by only flagging the first bind call as
explicit.
This means if a framebuffer is re-bound by a secondary code-path,
the re-started render-pass will not perform a secondary load. This
now allows explicit clear state to be specified on any attachment
type. Previously only memoryless attachments supported this.
To avoid further complexity, usage of`GPU_framebuffer_clear_* `
calls in conjunction with `GPU_framebuffer_bind_ex` will now
trigger an assertion failure.
Authored by Apple: Michael Parkin-White
Pull Request: https://projects.blender.org/blender/blender/pulls/115734
Avoid reusing the custom data type enum with additional values. Instead
use std::variant and type names to properly distinguish between custom
and generic attribute requests. Use a Vector to hold the requests.
Also attempt to simplify the string key building process for requests
and groups of requests in batches. Previously for every PBVH node it
would rebuild the key 3 times, now it only does it once. It's hard to
measure, but that process did show up in profiles, so performance is
probably slightly improved when many nodes are handled at once.
Prefer the name 'hit_result' since 'result' was sometimes used for
a vector of GPUSelectResult and is often used a functions return value.
Use hit_results for the span/vector and hit_result for a single hit.
Also assign struct members for new GPUSelectResult as it reads better
and avoids depending on struct order.
This is caused by some path not setting the proper usage flag
on the main depth stencil texture making impossible to use
stencil view of this texture.
Make this flag mandatory for offscreen buffer fixes the issue.
Pull Request: https://projects.blender.org/blender/blender/pulls/115361
Texture usage flag `GPU_TEXTURE_USAGE_MIP_SWIZZLE_VIEW`
was originally implemented and used too conservatively for many
cases in which the underlying API flags were not required.
Renaming to `GPU_TEXTURE_USAGE_FORMAT_VIEW` to reflect
the only essential use case for when a texture view is initialized with
a different texture format to the source texture. Texture views can
still be created without this flag when mip range or base level is
adjusted,
This flag is still required by stencil views and internally by the Metal
backend for certain feature support such as SRGB render toggling.
Patch also includes some small changes to the Metal backend to
adapt to this new compatibility and correctly capture all texture view
use-cases.
Related to #115269
Authored by Apple: Michael Parkin-White
Pull Request: https://projects.blender.org/blender/blender/pulls/115300
PR Introduces GPU_storagebuf_sync_to_host as an explicit routine to
flush GPU-resident storage buffer memory back to the host within the
GPU command stream.
The previous implmentation relied on implicit synchronization of
resources using OpenGL barriers which does not match the
paradigm of explicit APIs, where indiviaul resources may need
to be tracked.
This patch ensures GPU_storagebuf_read can be called without
stalling the GPU pipeline while work finishes executing. There are
two possible use cases:
1) If GPU_storagebuf_read is called AFTER an explicit call to
GPU_storagebuf_sync_to_host, the read will be synchronized.
If the dependent work is still executing on the GPU, the host
will stall until GPU work has completed and results are available.
2) If GPU_storagebuf_read is called WITHOUT an explicit call to
GPU_storagebuf_sync_to_host, the read will be asynchronous
and whatever memory is visible to the host at that time will be used.
(This is the same as assuming a sync event has already been signalled.)
This patch also addresses a gap in the Metal implementation where
there was missing read support for GPU-only storage buffers.
This routine now uses a staging buffer to copy results if no
host-visible buffer was available.
Reading from a GPU-only storage buffer will always stall
the host, as it is not possible to pre-flush results, as no
host-resident buffer is available.
Authored by Apple: Michael Parkin-White
Pull Request: https://projects.blender.org/blender/blender/pulls/113456
With the shift to GPU-driven rendering pipeline,
the SSBO vertex fetch paradigm used to
implement workbench shadows on Metal
instead of utilising the geometry shader
path no longer worked correctly.
This is because the draw submission
required vertex amplification up-front,
based on the expected output geometry
amount for a given input geometry.
This patch aims to resolve this
issue through addition of API to
enable the features within the
GPU driven pipeline.
Co-authored-by: Michael Parkin-White <mparkinwhite@apple.com>
Pull Request: https://projects.blender.org/blender/blender/pulls/113498