`GLBatch::draw_indirect` has additional overhead compared to
`GLBatch::draw`, and can become a bottleneck in scenes that require
many draw calls (ie. with too many unique meshes).
The performance difference is almost exclusively caused by the
`GL_COMMAND_BARRIER_BIT` barrier that happens on every call.
This PR adds a `GPU_storagebuf_sync_as_indirect_buffer` function that
can be used to place the barrier only once after filling the indirect
buffer content.
This function is a no-op in Vulkan and Metal since they don't need the
barrier.
Pull Request: https://projects.blender.org/blender/blender/pulls/117561
While investigating Blender compilation time for windows-arm64, we
identified two compilation units that were taking a long time to compile
(~1h each). This affects windows-x64 builds as well.
Pull Request: https://projects.blender.org/blender/blender/pulls/117534
The term `PIL` stands for "platform independent library." It exists since the `Initial Revision`
commit from 2002. Nowadays, we generally just use the `BLI` (blenlib) prefix for such code
and the `PIL` prefix feels more confusing then useful. Therefore, this patch renames the
`PIL` to `BLI`.
Pull Request: https://projects.blender.org/blender/blender/pulls/117325
Ensure attachment states and load/store configs don't get out of sync
with the framebuffer layout.
In theory, a Framebuffer could have empty attachments interleaved with
valid ones so checking just the attachments "length" is not enough.
What this does instead is to ensure that valid attachments have a valid
config and that null attachments either don't have a matching config or
have an IGNORE/DONT_CARE one.
Pull Request: https://projects.blender.org/blender/blender/pulls/117073
Allows specification of per-shader threadgroup memory tuning
to optimise performance through increase of GPU occupancy.
Authored by Apple: Michael Parkin-White
Pull Request: https://projects.blender.org/blender/blender/pulls/115238
This PR adds support for specialization constants for the OpenGL
backend. The minimum OpenGL version we are targetting doesn't
have native support for specialization constants. We simulate this
by keeping track of shader programs for each set of specialization
constants that are being used.
Specialization constants can be used to reduce shader complexity
and improve performance as less registry and/or spilling is done.
This requires the ability to recompile GLShaders. In order to do this
we need to keep track of the sources that are used when the shader
was compiled. For static sources we only store references
(`GLSource::source_ref`), for dynamically generated sources we keep
a copy of the source (`GLSource::source`).
When recompiling the shader GLSL source-code is generated for the
constants stored in `Shader::constants`. When compiling the previous
GLSource that contains specialization constants is then replaced
by the new version.
Pull Request: https://projects.blender.org/blender/blender/pulls/116926
This avoid the cost of creating the tiles themselves which uses a lot
texture write. This was a bottleneck on Apple GPUs.
Also the per pixel classification allows us to remove certain checks in
the deferred lighting shader making it faster.
### TODO
- [x] Add gl_FragStencilRefARB support on other backend
- [x] Add workaround for when gl_FragStencilRefARB isnt supported
Pull Request: https://projects.blender.org/blender/blender/pulls/116704
Along with the 4.1 libraries upgrade, we are bumping the clang-format
version from 8-12 to 17. This affects quite a few files.
If not already the case, you may consider pointing your IDE to the
clang-format binary bundled with the Blender precompiled libraries.
This adds some `#line` directive between the
source file injection so that the log parser knowns
which file the errors originated from.
This is then followed by a scan over the combined
source to find out the real row number.
This needed some changes in the `Shader::plint_log`
to skip lines to avoid outputing redundant information.
Adds API to allow usage of specialization constants in shaders.
Specialization constants are dynamic runtime constants which can
be compiled into a shader pipeline state object (PSO) to improve
runtime performance by reducing shader complexity through
shader compiler constant-folding.
This API allows specialization constant values to be specified
along with a default value if no constant value has been declared.
Each GPU backend is then responsible for caching PSO permutations
against the current specialization configuration.
This patch adds support for specialization constants in the
Metal backend and provides a generalised high-level solution
which can be adopted by other graphics APIs supporting
this feature.
Authored by Apple: Michael Parkin-White
Authored by Blender: Clément Foucault (files in gpu/test folder)
Pull Request: https://projects.blender.org/blender/blender/pulls/115193
Remove most includes of this header inside other headers, to remove unnecessary
indirect includes which can have a impact on compile times. In the future we may
want more dedicated "_fwd.hh" headers, but until then, this sticks with the
solution in existing code.
Unfortunately it isn't yet possible to remove the include from `BKE_geometry_set.hh`.
Each value is now out of the global namespace, so they can be shorter
and easier to read. Most of this commit just adds the necessary casting
and namespace specification. `enum class` can be forward declared since
it has a specified size. We will make use of that in the next commit.
This patch adds an alternative path for devices/OSs
which do not support native texture atomics in Metal.
Support is encapsulated within the backend, ensuring
any allocated texture with the USAGE_ATOMIC flag is
allocated with a backing buffer, upon which atomic
operations happen.
The shader generation is also changed for the atomic
case, which instructs the backend to insert additional
buffer bind-points for the buffer resource. As Metal
also only supports buffer-backed textures for
textureBuffers or 2D textures, TextureArrays and
3D textures are emulated within a 2D texture, with
sample locations being indirected.
All usage of atomic textures MUST now utilise the
correct atomic texture types in the high level shader
and GPUShaderCreateInfo declarations.
Authored by Apple: Michael Parkin-White
Pull Request: https://projects.blender.org/blender/blender/pulls/115956
This PR introduced some filters to improve the workflow when using
shader_builder. Shader builder is used to validate shader compilation
during buildtime and can be enabled using `WITH_GPU_BUILDTIME_SHADER_BUILDER`.
During backend development shader builder is also handy as you can
pin-point it to the shader/backend you're focusing on. Without filters
you would insert temporary code to break on a specific shader.
* `--gpu-backend` can be used to only check a specific backend.
possible values are `vulkan`, `metal` or `opengl`. When argument
isn't passed, all backends will be validated.
* `--gpu-shader-filter` can be used to only check a subset or indivisual
shader. The filter is a name starts with filter. Use
`--gpu-shader-filter eevee` to validate all eevee shaders
Pull Request: https://projects.blender.org/blender/blender/pulls/115888
NDEBUG is part of the C standard and disables asserts. Only this will
now be used to decide if asserts are enabled.
DEBUG was a Blender specific define, that has now been removed.
_DEBUG is a Visual Studio define for builds in Debug configuration.
Blender defines this for all platforms. This is still used in a few
places in the draw code, and in external libraries Bullet and Mantaflow.
Pull Request: https://projects.blender.org/blender/blender/pulls/115774
Flag enables backends to differentiate between a framebuffer
bind with a custom loadstore state and a standard bind.
For Metal, this resolves an ambiguous complexity about loading
or clearing attachments by only flagging the first bind call as
explicit.
This means if a framebuffer is re-bound by a secondary code-path,
the re-started render-pass will not perform a secondary load. This
now allows explicit clear state to be specified on any attachment
type. Previously only memoryless attachments supported this.
To avoid further complexity, usage of`GPU_framebuffer_clear_* `
calls in conjunction with `GPU_framebuffer_bind_ex` will now
trigger an assertion failure.
Authored by Apple: Michael Parkin-White
Pull Request: https://projects.blender.org/blender/blender/pulls/115734
Avoid reusing the custom data type enum with additional values. Instead
use std::variant and type names to properly distinguish between custom
and generic attribute requests. Use a Vector to hold the requests.
Also attempt to simplify the string key building process for requests
and groups of requests in batches. Previously for every PBVH node it
would rebuild the key 3 times, now it only does it once. It's hard to
measure, but that process did show up in profiles, so performance is
probably slightly improved when many nodes are handled at once.
Prefer the name 'hit_result' since 'result' was sometimes used for
a vector of GPUSelectResult and is often used a functions return value.
Use hit_results for the span/vector and hit_result for a single hit.
Also assign struct members for new GPUSelectResult as it reads better
and avoids depending on struct order.
This is caused by some path not setting the proper usage flag
on the main depth stencil texture making impossible to use
stencil view of this texture.
Make this flag mandatory for offscreen buffer fixes the issue.
Pull Request: https://projects.blender.org/blender/blender/pulls/115361
Texture usage flag `GPU_TEXTURE_USAGE_MIP_SWIZZLE_VIEW`
was originally implemented and used too conservatively for many
cases in which the underlying API flags were not required.
Renaming to `GPU_TEXTURE_USAGE_FORMAT_VIEW` to reflect
the only essential use case for when a texture view is initialized with
a different texture format to the source texture. Texture views can
still be created without this flag when mip range or base level is
adjusted,
This flag is still required by stencil views and internally by the Metal
backend for certain feature support such as SRGB render toggling.
Patch also includes some small changes to the Metal backend to
adapt to this new compatibility and correctly capture all texture view
use-cases.
Related to #115269
Authored by Apple: Michael Parkin-White
Pull Request: https://projects.blender.org/blender/blender/pulls/115300