Rendering animations from Python scripts via `bpy.ops.render.opengl()`
did not trigger any of the notifications in the Metal back-end to
indicate a frame had been rendered and that the associated resources
could be released. This adds a call to GPU_render_step() after each
render. For the original asset in the bug report this reduces the high
memory watermark from 30gb to 13gb for 500 frames. 13gb is likely
still too high and therefore it is likely there are additional leaks
that need to be addressed so this should only be considered a partial
fix.
Authored by Apple: James McCarthy
Co-authored-by: James McCarthy <jamesmccarthy@apple.com>
Co-authored-by: Clément Foucault <foucault.clem@gmail.com>
Pull Request: https://projects.blender.org/blender/blender/pulls/131085
GPUViewport is creating a bunch of framebuffer textures for itself, but
some space types never initialize/use them. E.g. Sequencer, Nodes etc.
only ever use the "overlay" texture. Eventually when viewport is
"drawn", it combines this uninitialized texture data and then only by
luck it happens that most of the time it is black. But not always!
The textures were only cleared (right now) on Metal backend, under
GPU_clear_viewport_workaround as if it was some driver workaround. Stop
doing that, and just clear them always.
However, there was seemingly a performance issue on OpenGL, when this
clear was being done. At least on my machine (Win10, Geforce RTX
3080Ti), the overhead of doing the clears is measurable, and is caused
by usage of GL4.4 glClearTexImage instead of a framebuffer clear. As if
glClearTexImage makes "pixel data to exist" on the CPU side and then
later on binding this framebuffer sends off that data to the GPU, or
somesuch.
More details in the PR.
Pull Request: https://projects.blender.org/blender/blender/pulls/131518
The string `msl_patch_default` can have been read
partially uninitialized or initialized multiple
time and read uncomplete during multithreaded
compilation.
This should fix the GPU tests randomly failing on mac.
While this would never fail when blender runs from the UI (since
UI shaders are init in single threaded manner and always compile
before EEVEE shaders), this race condition could happen when running
EEVEE through background rendering or running tests.
Pull Request: https://projects.blender.org/blender/blender/pulls/131580
The new buffer size could have been non aligned when using the
fractional growing heuristic. This non aligned allocation
would then trigger an assert at the SSBO constructor.
Aligning the alocation size fixes the issue.
This happened because NVidia GPUs require higher alignment
for SSBO binds than for vertex inputs.
This is related to #131103 which fixed it for vulkan.
Add a common capability option for that.
This port is not so straightforward.
This shader is used in different configurations and is
available to python bindings. So we need to keep
compatibility with different attributes configurations.
This is why attributes are loaded per component and a
uniform sets the length of the component.
Since this shader can be used from both the imm and batch
API, we need to inject some workarounds to bind the buffers
correctly.
The end result is still less versatile than the previous
metal workaround (i.e.: more attribute fetch mode supported),
but it is also way less code.
### Limitations:
The new shader has some limitation:
- Both `color` and `pos` attributes need to be `F32`.
- Each attribute needs to be 4byte aligned.
- Fetch type needs to be `GPU_FETCH_FLOAT`.
- Primitive type needs to be `GPU_PRIM_LINES`, `GPU_PRIM_LINE_STRIP` or `GPU_PRIM_LINE_LOOP`.
- If drawing using an index buffer, it must contain no primitive restart.
Rel #127493
Co-authored-by: Jeroen Bakker <jeroen@blender.org>
Pull Request: https://projects.blender.org/blender/blender/pulls/129315
In some cases the MTLContext was being destroyed before all GPU work was completed causing the (outstanding) command buffer completion event handler to update a command buffer that had already been freed. This behaviour was introduced by [this](https://projects.blender.org/blender/blender/commit/6da42e9c951b) change which updated the event handler to track the number of outstanding command buffers per context as well as system-wide.
Reproduced the issue with ASAN enabled and confirmed that waiting for the GPU to complete fixes the issue.
Also contains a minor fix for unitiiliased values in MTLAttachments identified by ASAN.
Authored by Apple: James McCarthy"
Co-authored-by: James McCarthy <jamesmccarthy@apple.com>
Pull Request: https://projects.blender.org/blender/blender/pulls/129686
For Metal we can change the texture usage flags to get more optimal
behaviour - one example is adding the attachment flag so we can utilise
renders to do texture clears. However these usage flags are used as the
part of the match-criteria when trying to reuse released textures in
the texture pool.
The modifications means a request for the same type of texture will
fail causing a cache miss. When we render to an
image-view the texture pool is not released until the final sample has
been rendered as we consider the entire render to be a single frame
(as opposed to normal viewport rendering when we are presenting the
intermediate results).
This causes the texture pool to grow and grow and grow hence the large
memory usage. This fix splits the usage flags
into two sets, the internal ones we use to create the MTLTexture (which
we may modify) and the originally requested ones. The originally requested
ones are used for the texture pool matching.
This fix also improves memory efficiency for normal viewport rendering.
Mr Elephant Scene
Before -> After
Load scene in viewport: 13.04Gb -> 9.15 Gb
Viewport Render Image: 78.69Gb -> 16.61Gb
Authored by Apple: James McCarthy
Pull Request: https://projects.blender.org/blender/blender/pulls/129951
Crash manifested after the inclusion of #128877.
The very tall 3D texture tested by the new code
were not supported / tested by the Metal Backend.
Simply adding the appropriate upfront checks fixes
the issue.
Needs to be backported to 4.2
Avoid measuring the length of strings repeatedly by passing their
length along with their data with `StringRefNull`. Null termination
seems to be necessary still for passing the shader sources to OpenGL.
Though I doubt this is a bottleneck, it's still nice to avoid overhead from
string operations and this helps move in that direction.
Pull Request: https://projects.blender.org/blender/blender/pulls/127702
Running Xcode memory graphs and the Instruments tools revealed
memory leaks caused, in the main, by over-retained objects.
This removes the unnecessary 'retains' and adds some asserts
to guard against over-retaining in the future.
There are a few memory leaks remaining involving PyUnicode_DecodeUTF8
but I am unable to identify the cause of these at this time.
Authored by Apple: James McCarthy
Pull Request: https://projects.blender.org/blender/blender/pulls/129117
Addresses the case when Blender is shutdown before the
parallel compiler has finished processing all the shader batches.
The parallel compiler destructor will now attempt to terminate all
of the outstanding batches and free the shaders.
Authored by Apple: James McCarthy
Pull Request: https://projects.blender.org/blender/blender/pulls/129172
The goal is to reduce the startup time cost of
all of these parsing and string replacement.
All comments are now stripped at compile time.
This comment check added noticeable slowdown at
startup in debug builds and during preprocessing.
Put all metadatas between start and end token.
Use very simple parsing using `StringRef` and
hash all identifiers.
Move all the complexity to the preprocessor that
massagess the metadata into a well expected input
to the runtime parser.
All identifiers are compile time hashed so that no string
comparison is made at runtime.
Speed up the source loading:
- from 10ms to 1.6ms (6.25x speedup) in release
- from 194ms to 6ms (32.3x speedup) in debug
Follow up #129009
Pull Request: https://projects.blender.org/blender/blender/pulls/128927
Reported by @fclem. On Metal GPU_finish enacts a CPU<-> GPU sync by
submitting a command buffer and waiting on the completion event.
However if there was no command buffer to submit then the call was
returning immediately, regardless of any outstanding GPU work.
This fixes that case by keeping track of all outstanding work and
blocking on that.
Authored by Apple: James McCarthy
Co-authored-by: James McCarthy <jamesmccarthy@apple.com>
Pull Request: https://projects.blender.org/blender/blender/pulls/128987
Move most of the string preprocessing used for MSL
compatibility to `glsl_preprocess`.
Enforce some changes like matrix constructor and
array constructor to the GLSL codebase. This is
for C++ compatibility.
Additionally reduce the amount of code duplication
inside the compatibility code.
Pull Request: https://projects.blender.org/blender/blender/pulls/128634
The metal backend assumes that textures can always be allocated. When
metal later detects that the texture cannot be 'baked' it leads to
undesired behavior as Blender think it has a texture with memory
allocated.
This PR only changes the GPU_TEXTURE_3D case as that leads to issues
when loading large voluetric data. Arrayed textures will still fail
as that requires different checks to be added. I rather re-view the
current implementation in the future.
> NOTE: The max depth is hardcoded to 2048
Change should be backported to 4.2
Pull Request: https://projects.blender.org/blender/blender/pulls/128365
Takes into account any offset that must be added to the vertex index
(usually supplied as baseVertex or startVertex in the Metal draw call)
in the code that emulates the SSBO vertex fetch.
Authored by Apple: James McCarthy
Pull Request: https://projects.blender.org/blender/blender/pulls/127864
Parallel shader compilation introduced `GPU_shader_cache_dir_clear_old`.
The implementation was specific to OpenGL and could not be overwritten
by other backends. This PR improves the implementation so the backend
can have its own implementation.
This is needed for upcoming changes to the Vulkan backend where we
want to use similar mechanisms to speed up shader compilation and caching.
Pull Request: https://projects.blender.org/blender/blender/pulls/127680