- Expected results should come before actual result.
- Add test case for 8192 bytes as apple has a push constants size of 4096.
- Add more variation to the first order test data.
Improvements detected when working on vulkan backend and validated they
work on metal and opengl as well.
Pull Request: https://projects.blender.org/blender/blender/pulls/120557
A small logic issue caused all write-only image resources
to be tagged as read-write in all cases. This caused
correctness issues on Intel and AMD GPUs which
are resolved through this change.
Change also yields a small performance uplift
due to enabling improved non-dependent workload
scheduling.
Authored by Apple: Michael Parkin-White
Co-authored-by: Michael Parkin-White <mparkinwhite@apple.com>
Pull Request: https://projects.blender.org/blender/blender/pulls/120528
Adds support for subpass transition for AMD/Intel IMR
GPUs. This enables correct functioning of EEVEE Next
deferred lighting pass on AMD platforms.
The emulation is consistent with the OpenGL approach
of generating additional texture bindings in the shader
for subpass inputs, and splitting render passes across
sub-pass boundaries.
Authored by Apple: Michael Parkin-White
Pull Request: https://projects.blender.org/blender/blender/pulls/119784
MoltenVK original intent was to let developers work on a mac system developing
for the vulkan eco-system. MoltenVK doesn't support all the features that we
require and would require additional workarounds to be actually supported.
It is not expected that we will release Blender with MoltenVK for this reason.
But it still has value for shader developers to validate shaders on metal and
vulkan on a single platform.

Pull Request: https://projects.blender.org/blender/blender/pulls/117940
Removes the implicit USAGE_ATTACHMENT flag from
atomic fallback textures which are buffer-backed.
This usage flag results in a validation failure, and is
not required by these textures as they are cleared
via the backing buffer.
Authored by Apple: Michael Parkin-White
Pull Request: https://projects.blender.org/blender/blender/pulls/119785
This PR is another step in the refactor described by #116901.
This change applies to the triangle index buffer. The main improvement
is the ability to recognize when the mesh corner triangles index array
can be uploaded directly (when there is a single material and no hidden
faces). In that case the index data should be copied directly to the
GPU rather than to a temporary array owned by the IBO first. Though
that isn't implemented yet since it will be handled by the GPU module
later, the code is now structured to make that change simple from the
data extraction perspective.
Other than that, the main change is to not use the extractor iterator
framework anymore, and to set index data directly instead of using GPU
API functions. Though we're mainly bottlenecked by memory-bandwidth
anyway, it's nice to avoid function call overhead.
We also now avoid creating the array of sorted triangle indices when
there is a single material and no hidden faces. And we don't use
restart indices for the single-material case anymore. For Metal that's
nice because we can avoid `strip_restart_indices`.
I didn't notice significant performance improvements in my test files
beyond a few percent here and there. With a hacked implementation of
the copy-directly-to-the-gpu optimization, I did see more consistent
improvements though.
Pull Request: https://projects.blender.org/blender/blender/pulls/119130
Tune Metal compilation parameters for horizon scan
shaders for optimal performance. Selectively unrolling
loops and modifying compilation heuristics results in a
~25% uplift in tracing shader performance, due to
improved latency management.
Authored by Apple: Michael Parkin-White
Pull Request: https://projects.blender.org/blender/blender/pulls/119737
The Perlin noise algorithms suffer from precision issues when a coordinate
is greater than about 250000.
To fix this the Perlin noise texture is repeated every 100000 on each axis.
This causes discontinuities every 100000, however at such scales this
usually shouldn't be noticeable.
Pull Request: https://projects.blender.org/blender/blender/pulls/119884
For Batch::verts some values weren't cleared,
for Batch::inst values after the array would be cleared,
although as these were already zeroed this probably didn't cause
problems in practice.
This limits the number of tilemaps per LOD that can be fed to avoid the
easy to hit "Too many shadow updates" (#119757).
This allows for a max 64 tilemaps to be updated at once at their lowest
requested LOD (so ~10.6667 point lights if every faces of the punctual
shadow map is needed, but likely more in practice).
Unfortunately this is still quite low and will surely be hit quite soon
with directional shadow added to it. One idea to workaround this would
be to time slice the update of some lights, but this opens a whole can
of worms that I'm not ready to open for now so I created #119890 for
future reference.
Some notes, most lights seems to request around 3 LODs. It might help
to allow requesting at least 2 LODs if we are rendering since volumes
might want lower LOD available for volumes.
I added a very simplistic heuristic that also lowers the max tilemaps
when transforming, animation playback or navigating the 3D view to
improve the responsiveness of the engine. Note that this doesn't
only lowers the resolution to the minimum requested one. So it should
be good enough in most cases.
Pull Request: https://projects.blender.org/blender/blender/pulls/119889
Textures that are GPU-compressed already (in practice: from DDS files
that are DXT1/DXT3/DXT5 compressed) now can stay GPU compressed
in Vulkan, similar to how that works on OpenGL.
Additionally, fixed lack of mipmaps in Vulkan textures. The textures
were created with mipmaps (good), the sampler too (good), but
the vulkan image view was always saying "yo, this is mip 0 only"
because mip range variables were never set to anything than zero.
Pull Request: https://projects.blender.org/blender/blender/pulls/119866
Every vulkan installation has a vk.xml file containing the vulkan specification
in a machine readable fasion.
This PR uses the vk.xml to generate to_string functions for data types blender uses.
When updating to a new specification or when changing features/extensions we
should re-generate the to_string functions.
The generator is implemented in `vk_to_string.py`.
Pull Request: https://projects.blender.org/blender/blender/pulls/119880
Now that all relevant code is C++, the indirection from the C struct
`GPUVertBuf` to the C++ `blender::gpu::VertBuf` class just adds
complexity and necessitates a wrapper API, making more cleanups like
use of RAII or other C++ types more difficult.
This commit replaces the C wrapper structs with direct use of the
vertex and index buffer base classes. In C++ we can choose which parts
of a class are private, so we don't risk exposing too many
implementation details here.
Pull Request: https://projects.blender.org/blender/blender/pulls/119825
When frame capturing cannot be start an error is printed to the console.
Most of the time the issue is that you're not running from within a frame
capturing environment. For example not from your IDE/GPU debugger.
The print statement is often just not that useful. Especially when
running the `WITH_GPU_DRAW_TESTS` where it floods the console.
Pull Request: https://projects.blender.org/blender/blender/pulls/119783
Resolves an issue with stroke rendering in
Metal using the geometry shader fallback
path. Stroke rendering now matches OpenGL
which should enable the GPencil fill tool to
function correctly at all zoom levels.
Authored by Apple: Michael Parkin-White
Pull Request: https://projects.blender.org/blender/blender/pulls/119660
This uses Spherical Harmonics to store the indirect lighting and
distant lighting visibility.
We can then reuse this information for each closure which divide
the cost of it by 2 or 3 in many cases, doing the scanning once.
The storage cost is higher than previous method, so we split the
resolution scaling to be independant of raytracing.
The spatial filtering has been split to its own pass for performance
reason. Upsampling now only uses 4 bilinearly interpolated samples
(instead of 9) using bilateral weights to avoid bleeding.
This also add a missing dot product (which soften the lighting
around corners) and fixes the blocky artifacts seen at lower
resolution.
Pull Request: https://projects.blender.org/blender/blender/pulls/118924
Simplifies/optimizes the "font" shader. It runs faster now too, but primarily
this is so that it loads/initializes faster.
* Instead of doing blur via individual bilinear samples (where each sample is 4
texel fetches), do raw texel fetches of the kernel footprint and compute final
result by shifting the kernel weights according to bilinear fraction weight.
For 5x5 blur, this reduces number of texel fetches from 64 down to 36.
* Instead of checking "is the texel inside the glyph box? if so, then fetch it",
first fetch it, and then set result to zero if it was outside. Simplifies the
branching code flow in the compiled GPU shader.
* Avoid costly integer modulo/division for "unwrapping" the font texture. The
texture width is always power of two size, so division/modulo can be replaced
by masking and a shift. Setup uniforms to contain the needed data.
### Fixes
* The 3x3 blur was not doing a 3x3 blur, due to a copy-pasta typo (one of the
sample offsets was repeated twice, and thus another sample offset was
missing).
* Blur towards left/top edges of the glyphs had artifacts, because float->int
casting in GLSL rounds towards zero, but the code actually wanted to round
towards floor.
Image of how the blur has changed in the PR.
### First time initialization
* Windows 10, NVIDIA RTX 3080Ti, OpenGL: 274.4ms -> 51.3ms
* macOS, Apple M1 Max, Metal: 456ms -> 289ms (this is including PSO creation
time).
### Shader performance/complexity
Performance I only measured on macOS (M1 Max), by making a BLF text that is
scaled up to cover most of screen via Python. Using Xcode Metal profiler,
drawing that text with 5x5 shadow blur: 1.5ms -> 0.3ms.
More performance analysis details in PR.
Pull Request: https://projects.blender.org/blender/blender/pulls/119653
Resolves custom attribute types for ints and booleans by ensuring
conversion mode is correct. Previously, the attribute declarations
were assumed to be linear. However, patch ensures the correct
attribute index is now fetched, ensuring the conversion mode
is correctly specified for non-linear attribute ID's.
Authored by Apple: Michael Parkin-White
Pull Request: https://projects.blender.org/blender/blender/pulls/119569