Texture usage flag `GPU_TEXTURE_USAGE_MIP_SWIZZLE_VIEW`
was originally implemented and used too conservatively for many
cases in which the underlying API flags were not required.
Renaming to `GPU_TEXTURE_USAGE_FORMAT_VIEW` to reflect
the only essential use case for when a texture view is initialized with
a different texture format to the source texture. Texture views can
still be created without this flag when mip range or base level is
adjusted,
This flag is still required by stencil views and internally by the Metal
backend for certain feature support such as SRGB render toggling.
Patch also includes some small changes to the Metal backend to
adapt to this new compatibility and correctly capture all texture view
use-cases.
Related to #115269
Authored by Apple: Michael Parkin-White
Pull Request: https://projects.blender.org/blender/blender/pulls/115300
Ensure that on closing of the application, all pending
SafeFreeLists are released. A new change to ensure
release of SafeFreeLists was always deferred until
full completion of GPU buffers meant that a pending
MTLSafeFreeList container may not be released on
shutdown.
This change ensures that the container is fully
destructed on final shutdown.
Authored by Apple: Michael Parkin-White
Pull Request: https://projects.blender.org/blender/blender/pulls/114532
The null byte wasn't taken into account when allocating memory
to strcpy into.
The calculation to check if allocation was needed was also wrong,
causing allocation for every string.
In practice it's not so likely users would ever hit this since
the function tended to over allocate, even in the case an off by one
error occurred, in all likelihood the room would already be available.
Ref !114512
After previous changes to allow command buffers to not require
execution and completion in submission order,
guarantees for releasing freed buffers back to the memory pool
within the frame life time had changed.
This could mean a released buffer could be returned to the
memory pool prematurely, if a subsequent command buffer
completes before a previously submitted one, flagging a resource as no
longer in use by the GPU, while it still may be in use by the orignal
command buffer.
This PR defers final reference count release for buffers
being actively used until the following call to GPU_render_step,
to ensure that buffers freed will be available for the lifetime of
the frame, covering all command submissions, rather than just
within the lifetime of the command buffer submission within which a
buffer was freed.
Authored by Apple: Michael Parkin-White
Pull Request: https://projects.blender.org/blender/blender/pulls/114329
The last good commit was 8474716abb.
After this commits from main were pushed to blender-v4.0-release. These are
being reverted.
Commits a4880576dc from to b26f176d1a that happend afterwards were meant for
4.0, and their contents is preserved.
PR Introduces GPU_storagebuf_sync_to_host as an explicit routine to
flush GPU-resident storage buffer memory back to the host within the
GPU command stream.
The previous implmentation relied on implicit synchronization of
resources using OpenGL barriers which does not match the
paradigm of explicit APIs, where indiviaul resources may need
to be tracked.
This patch ensures GPU_storagebuf_read can be called without
stalling the GPU pipeline while work finishes executing. There are
two possible use cases:
1) If GPU_storagebuf_read is called AFTER an explicit call to
GPU_storagebuf_sync_to_host, the read will be synchronized.
If the dependent work is still executing on the GPU, the host
will stall until GPU work has completed and results are available.
2) If GPU_storagebuf_read is called WITHOUT an explicit call to
GPU_storagebuf_sync_to_host, the read will be asynchronous
and whatever memory is visible to the host at that time will be used.
(This is the same as assuming a sync event has already been signalled.)
This patch also addresses a gap in the Metal implementation where
there was missing read support for GPU-only storage buffers.
This routine now uses a staging buffer to copy results if no
host-visible buffer was available.
Reading from a GPU-only storage buffer will always stall
the host, as it is not possible to pre-flush results, as no
host-resident buffer is available.
Authored by Apple: Michael Parkin-White
Pull Request: https://projects.blender.org/blender/blender/pulls/113456
With the shift to GPU-driven rendering pipeline,
the SSBO vertex fetch paradigm used to
implement workbench shadows on Metal
instead of utilising the geometry shader
path no longer worked correctly.
This is because the draw submission
required vertex amplification up-front,
based on the expected output geometry
amount for a given input geometry.
This patch aims to resolve this
issue through addition of API to
enable the features within the
GPU driven pipeline.
Co-authored-by: Michael Parkin-White <mparkinwhite@apple.com>
Pull Request: https://projects.blender.org/blender/blender/pulls/113498
Changes to ensure all supported texture tests are passing with the
Metal backend and add additional tests to cover texture_3d and
texture 1d test cases.
Authored by Apple: Michael Parkin-White
Co-authored-by: Michael Parkin-White <mparkinwhite@apple.com>
Pull Request: https://projects.blender.org/blender/blender/pulls/113889
With the shift to GPU-driven rendering pipeline,
the SSBO vertex fetch paradigm used to
implement workbench shadows on Metal
instead of utilising the geometry shader
path no longer worked correctly.
This is because the draw submission
required vertex amplification up-front,
based on the expected output geometry
amount for a given input geometry.
This WIP patch aims to resolve this
issue through addition of API to
enable the features within the
GPU driven pipeline.
Co-authored-by: Michael Parkin-White <mparkinwhite@apple.com>
Pull Request: https://projects.blender.org/blender/blender/pulls/113498
This patch improves the heuristic used to
determine maximum frames-in-flight at
different input latency levels.
Previous values adjusted to better encapsulate
workloads in the 10-20 FPS range which can
manifest excessive latency. Changes improve
total performance throughput and improve
the interactive user experience.
Authored by Apple: Michael Parkin-White
Pull Request: https://projects.blender.org/blender/blender/pulls/113020
Optimization of EEVEE Next's Virtual Shadow Maps for TBDRs.
The core of these optimizations lie in eliminating use of
atomic shadow atlas writes and instead utilise tile memory to
perform depth accumulation as a secondary pass once all
geometry updates for a given shadow view have been updated.
This also allows use of fast on-tile depth testing/sorting, reducing
overdraw and redundant fragment operations, while also allowing
for tile indirection calculations to be offloaded into the vertex
shader to increase fragment storage efficiency and throughput.
Authored by Apple: Michael Parkin-White
Co-authored-by: Michael Parkin-White <mparkinwhite@apple.com>
Co-authored-by: Clément Foucault <foucault.clem@gmail.com>
Pull Request: https://projects.blender.org/blender/blender/pulls/111283
This adds basic emulation of the subpass input feature
of vulkan and to a lower extend Raster Order Group on Metal.
This help test paths that might use this feature in the future
(like shadow rendering) on all platform and or simplify higher
level code for supporting older hardware.
This add clear description to the load/store ops and to the
new `GPUAttachementState`.
The OpenGL backend will correctly mask un-writable
attachments and will bind as texture readable attachments.
Even if possible by the vulkan standard, the GPU API prohibit
the read and write to the same attachment inside the same
subpass.
In the GL backend, this is implemented using `glTextureBarrier`
and `texelFetch` as it is described in the ARB_texture_barrier
extension.
https://registry.khronos.org/OpenGL/extensions/ARB/ARB_texture_barrier.txt
Pull Request: https://projects.blender.org/blender/blender/pulls/112051
Flickering caused by in-flight SSBO data
overwrites has been resolved by ensuring
data updates go into a new buffer while
existing data is in flight.
GPU_finish has also been removed from
SSBO read due to its frequent mid-frame use
limiting performance.
Authored by Apple: Michael Parkin-White
Pull Request: https://projects.blender.org/blender/blender/pulls/113019
Texture Atomics have been added in Metal 3.1
and enable the original implementations of
shadow update and irradiance cache baking.
However, a fallback solution will be
required for versions under macOS 14.0 utilising
buffer-backed textures instead.
This patch also includes a stub implementation if
building/running on older macOS versions which
provides locally-synchronized texture access in
place of atomics. This enables some effects to be
partially tested, and ensures non-guarded use
of imageAtomic functions does not result
in compilation failure.
Authored by Apple: Michael Parkin-White
Pull Request: https://projects.blender.org/blender/blender/pulls/112866
Blender 4.0 requires OpenGL 4.3 which always support SSBO's.
Platforms that don't support enough SSBO bind points will be marked
as unsupported.
Users who start Blender on those platforms will be informed via a
dialog. This PR also updates the `--debug-gpu-force-workarounds`
to match our minimum requirements. Note that some bugs are still
there that should be solved in other PRs:
* Workbench only renders the object using a unit matrix this is because
there is a bug in the workaround for shader_draw_parameters
* Navigating with middle mouse button is not working. Unsure what the
cause is, but might be a missing feature check in the OpenGL backend.
Related to #112224
Pull Request: https://projects.blender.org/blender/blender/pulls/112572
The NDEBUG is a toggle define, and in debug builds it is not defined.
This change solves the warning
mtl_context.mm:2282:5: warning: 'NDEBUG' is not defined, evaluates to 0 [-Wundef]
Pull Request: https://projects.blender.org/blender/blender/pulls/112504
Enables performance optimizations and new rendering
features through allowing colour attachments to be
tagged with a rasterization order group. This means that
all accesses to the same pixel location are guaranteed
to happen in submission order.
This lays the ground work for tile-based architecture
optimizations, enabling additional features for
deferred rendering which allow subsequent passes
which operate on pixel data to occur in order, while
memory remains on-tile.
This patch allows fragment outputs to be tagged
with a raster order group and also adds a new
FragmentTileIn parameter to a shader, which allows
speciication of incoming parameters from a previous
draw.
Authored by Apple: Michael Parkin-White
Pull Request: https://projects.blender.org/blender/blender/pulls/111748
Enhance custom framebuffer binding state to allow
specification of clear color as part of the loadstore
state at framebuffer bind time.
This ensures all parameters controlling attachment
loading and storage behaviour can be explicitly
specified when binding a framebuffer.
This change enables optimizations which leverage
explicit framebuffer load store state to also specify
a clear color without prematurely triggering a
clear which may occur independently to
render work when using GPU_framebuffer_clear(..).
Authored by Apple: Michael Parkin-White.
Pull Request: https://projects.blender.org/blender/blender/pulls/111810
Removes old unused bind state parameters from
prior to the addition of explicit resource location support.
Patch also ensures compute state is reset between each
encoder and adds a debug option to dispatch each
compute command within its own encoder for debugging
synchronization issues.
Authored by Apple: Michael Parkin-White
Pull Request: https://projects.blender.org/blender/blender/pulls/111753
Memoryless textures are only used as intermediate attachments
during rasterization, but do not have any backing storage. This is
particularly useful if a virutal framebuffer is needed, or, there is
a situation where a depth buffer is only needed within the pass
itself and the results are discarded once the pass completes.
Authored by Apple: Michael Parkin-White
Pull Request: https://projects.blender.org/blender/blender/pulls/111749
The hash tables and vector blenlib headers were pulling many more
headers than they actually need, including the C base math header,
our C string API header, and the StringRef header. All of this
potentially slows down compilation and polutes autocomplete
with unrelated information.
Also remove the `ListBase` constructor for `Vector`. It wasn't used
much, and making it easy to use `ListBase` isn't worth it for the
same reasons mentioned above.
It turns out a lot of files depended on indirect includes of
`BLI_string.h` and `BLI_listbase.h`, so those are fixed here.
Pull Request: https://projects.blender.org/blender/blender/pulls/111801
When GLSL sources were first included in Blender they were treated as
data (like blend files) and had no license header.
Since then GLSL has been used for more sophisticated features
(EEVEE & real-time compositing)
where it makes sense to include licensing information.
Add SPDX copyright headers to *.glsl files, matching headers used for
C/C++, also include GLSL files in the license checking script.
As leading C-comments are now stripped,
added binary size of comments is no longer a concern.
Ref !111247