The refactor 9c0321ae9b
had the wrong mental model of the backing texture
layout for the atomic workaround.
For 3D textures, the layout is breaking the 3D texture
and reinterpreting the linear location as its 2D
linear location. This breaks the 3D texture Z slices
into non contiguous regions in 2D.
Comments have been added to avoid future confusion.
Pull Request: https://projects.blender.org/blender/blender/pulls/133830
The goal is to reduce the startup time cost of
all of these parsing and string replacement.
All comments are now stripped at compile time.
This comment check added noticeable slowdown at
startup in debug builds and during preprocessing.
Put all metadatas between start and end token.
Use very simple parsing using `StringRef` and
hash all identifiers.
Move all the complexity to the preprocessor that
massagess the metadata into a well expected input
to the runtime parser.
All identifiers are compile time hashed so that no string
comparison is made at runtime.
Speed up the source loading:
- from 10ms to 1.6ms (6.25x speedup) in release
- from 194ms to 6ms (32.3x speedup) in debug
Follow up #129009
Pull Request: https://projects.blender.org/blender/blender/pulls/128927
Move most of the string preprocessing used for MSL
compatibility to `glsl_preprocess`.
Enforce some changes like matrix constructor and
array constructor to the GLSL codebase. This is
for C++ compatibility.
Additionally reduce the amount of code duplication
inside the compatibility code.
Pull Request: https://projects.blender.org/blender/blender/pulls/128634
Takes into account any offset that must be added to the vertex index
(usually supplied as baseVertex or startVertex in the Metal draw call)
in the code that emulates the SSBO vertex fetch.
Authored by Apple: James McCarthy
Pull Request: https://projects.blender.org/blender/blender/pulls/127864
VSE timeline strips now have rounded corners. Strip corner rounding radius is
4, 6 or 8px depending on strip height (if strip is too narrow to fit
rounding, then rounding is turned off).
This is achieved with a dedicated GPU shader for drawing most of VSE
strip widget, that it could do proper rounded corner masking.
More details and images in the PR.
Pull Request: https://projects.blender.org/blender/blender/pulls/122576
Add fast image writing and reading variants for render passes.
These variants do not perform range checking on values
and should only be used in cases where the written texel is
guaranteed to be in range. This eliminates additional
branching and simplifies shader logic.
Authored by Apple: Michael Parkin-White
Pull Request: https://projects.blender.org/blender/blender/pulls/121116
This adds a new `Transform` type similar to cycles that reduces
the amount of data passed for a typical affine 3D transform.
This then applies this type to the light data and cleanup
all usage of the former `object_mat`. This also changes the axes
macros into utility accessor functions.
Pull Request: https://projects.blender.org/blender/blender/pulls/121089
This limits the number of tilemaps per LOD that can be fed to avoid the
easy to hit "Too many shadow updates" (#119757).
This allows for a max 64 tilemaps to be updated at once at their lowest
requested LOD (so ~10.6667 point lights if every faces of the punctual
shadow map is needed, but likely more in practice).
Unfortunately this is still quite low and will surely be hit quite soon
with directional shadow added to it. One idea to workaround this would
be to time slice the update of some lights, but this opens a whole can
of worms that I'm not ready to open for now so I created #119890 for
future reference.
Some notes, most lights seems to request around 3 LODs. It might help
to allow requesting at least 2 LODs if we are rendering since volumes
might want lower LOD available for volumes.
I added a very simplistic heuristic that also lowers the max tilemaps
when transforming, animation playback or navigating the 3D view to
improve the responsiveness of the engine. Note that this doesn't
only lowers the resolution to the minimum requested one. So it should
be good enough in most cases.
Pull Request: https://projects.blender.org/blender/blender/pulls/119889
Cryptomatte passes would generate a feathered outline
in Metal due to missing texture fence in chained
read->modify->write->read->... patterns.
Added imageFence function to explicitly state that
imageStore's should be visible to future imageLoad's.
Authored by Apple: Michael Parkin-White
Pull Request: https://projects.blender.org/blender/blender/pulls/119163
This define all aliases for supported types,
document which one to use in C++ shared code,
move relevant defines to their backend file.
Rename `bool1` to `bool32_t` and cleanup
its usage as mentioned in #118961.
Rel. #118961
Pull Request: https://projects.blender.org/blender/blender/pulls/119098
This patch adds an alternative path for devices/OSs
which do not support native texture atomics in Metal.
Support is encapsulated within the backend, ensuring
any allocated texture with the USAGE_ATOMIC flag is
allocated with a backing buffer, upon which atomic
operations happen.
The shader generation is also changed for the atomic
case, which instructs the backend to insert additional
buffer bind-points for the buffer resource. As Metal
also only supports buffer-backed textures for
textureBuffers or 2D textures, TextureArrays and
3D textures are emulated within a 2D texture, with
sample locations being indirected.
All usage of atomic textures MUST now utilise the
correct atomic texture types in the high level shader
and GPUShaderCreateInfo declarations.
Authored by Apple: Michael Parkin-White
Pull Request: https://projects.blender.org/blender/blender/pulls/115956
This adds a new way of computing occlusion using visibility bitmask. To
make it more algorithm agnostic, we name it horizon scan.
This cleans-up / simplify the code compared to the Horizon based solution.
There is no more trickery for fading influence of distant samples which
makes the result match cycles closer.
This introduces a new thickness option. Maintaining it relatively low
makes it possible to avoid over occlusion because of in front geometry.
Making it too low will cause under occlusion.
Related #112979
Pull Request: https://projects.blender.org/blender/blender/pulls/114150
This adds correct object bounds estimation.
This works by creating an occupancy texture where one
bit represents one froxel. A geometry pre-pass fill this
occupancy texture and doesn't do any shading. Each bit
set to 0 will not be considered occupied by the object
volume and will discard the material compute shader for
this froxel.
There is 2 method of computing the occupancy map:
- Atomic XOR: For each fragment we compute the amount of
froxels **center** in-front of it. We then convert that
into occupancy bitmask that we apply to the occupancy
texture using `imageAtomicXor`. This is straight forward
and works well for any manifold geometry.
- Hit List: For each fragment we write the fragment depth
in a list (contained in one array texture). This list
is then processed by a fullscreen pass (see
`eevee_occupancy_convert_frag.glsl`) that sorts and
converts all the hits to the occupancy bits. This
emulate Cycles behavior by considering only back-face
hits as exit events and front-face hits as entry events.
The result stores it to the occupancy texture using
bit-wise `OR` operation to compose it with other non-hit
list objects. This also decouple the hit-list evaluation
complexity from the material evaluation shader.
## Limitations
### Fast
- Non-manifolds geometry objects are rendered incorrectly.
- Non-manifolds geometry objects will affect other objects
in front of them.
### Accurate
- Limited to 16 hits per layer for now.
- Non-manifolds geometry objects will affect other objects
in front of them.
Pull Request: https://projects.blender.org/blender/blender/pulls/113731
Texture Atomics have been added in Metal 3.1
and enable the original implementations of
shadow update and irradiance cache baking.
However, a fallback solution will be
required for versions under macOS 14.0 utilising
buffer-backed textures instead.
This patch also includes a stub implementation if
building/running on older macOS versions which
provides locally-synchronized texture access in
place of atomics. This enables some effects to be
partially tested, and ensures non-guarded use
of imageAtomic functions does not result
in compilation failure.
Authored by Apple: Michael Parkin-White
Pull Request: https://projects.blender.org/blender/blender/pulls/112866
imageStoreFast provides a variant of imageStore which does
not perform any bounds checking, reducing shader divergence,
register pressure and increasing performance through fewer
instructions.
However, this should only be used for cases where the writing
coordinate is guaranteed to fall within the texture.
Authored by Apple: Michael Parkin-White
Pull Request: https://projects.blender.org/blender/blender/pulls/111750
Addressing a number of small issues and OOB reads/writes occuring in
EEVEE Next shadows + lighting passes. Improving correctness for unit
tests. Shadows are not yet working overall, but this unblocks progress
towards unit tests.
Authored by Apple: Michael Parkin-White
Pull Request: https://projects.blender.org/blender/blender/pulls/108768
Adds support for Storage buffers, including changes to the resource
binding model to ensure explicit resource bind locations are supported
for all resource types.
Storage buffer support also includes required changes for shader
source generation and SSBO wrapper support for other resource
types such as GPUVertBuf, GPUIndexBuf and GPUUniformBuf.
Authored by Apple: Michael Parkin-White
Pull Request: https://projects.blender.org/blender/blender/pulls/107175
Shader source requires explicit conversions and shader address
space qualifers in certain places in order to compile for Metal.
We also require constructors for a number of default struct types.
Authored by Apple: Michael Parkin-White
Pull Request: https://projects.blender.org/blender/blender/pulls/106219
Optimized node graphs do not get cached and were
not correctly freed once their reference count reached
zero, due to being excluded from the GPUPass garbage
collection.
Also suppress Metal shader warnings, which are prevalent
during material optimization.
Authored by Apple: Michael Parkin-White
Pull Request: https://projects.blender.org/blender/blender/pulls/105795
Replace texelFetch calls with a texture point-sample rather than a textureRead call. This increases texture cache utilisation when mixing between sampled calls and reads. Bounds checking can also be removed from these functions, reducing instruction count and branch divergence, as the sampler routine handles range clamping.
Authored by Apple: Michael Parkin-White
Ref T96261
Depends on D16923
Reviewed By: fclem
Maniphest Tasks: T96261
Differential Revision: https://developer.blender.org/D17021
This patch adds support for compilation and execution of GLSL compute shaders. This, along with a few systematic changes and fixes, enable realtime compositor functionality with the Metal backend on macOS. A number of GLSL source modifications have been made to add the required level of type explicitness, allowing all compilations to succeed.
GLSL Compute shader compilation follows a similar path to Vertex/Fragment translation, with added support for shader atomics, shared memory blocks and barriers.
Texture flags have also been updated to ensure correct read/write specification for textures used within the compositor pipeline. GPU command submission changes have also been made in the high level path, when Metal is used, to address command buffer time-outs caused by certain expensive compute shaders.
Authored by Apple: Michael Parkin-White
Ref T96261
Ref T99210
Reviewed By: fclem
Maniphest Tasks: T99210, T96261
Differential Revision: https://developer.blender.org/D16990
Affecting render output preview when tone mapping is used, and EEVEE scenes such as Mr Elephant rendering in pink due to missing shaders.
Authored by Apple: Michael Parkin-White
Ref T103635
Ref T96261
Reviewed By: fclem
Maniphest Tasks: T103635, T96261
Differential Revision: https://developer.blender.org/D16923
Although viewport compositor isn't supported yet on Apple deviced the
shaderlibs are compiled. The compositor shaders uses mat3 constructor
with a single vec3 and 6 floats. This constructor wasn't defined in
metal so it failed the compilation.
This patch adds the override to mat3.