Since the introduction of storage buffers in Blender, the calling
code has been responsible for ensuring the buffer meets allocation
requirements. All backends require the allocation size to be divisible
by 16 bytes. Until now, this was sufficient, but with GPU subdivision
changes, an external library must also adhere to these requirements.
For OpenSubdiv (OSD), some buffers are not 16-byte aligned, leading
to potential misallocation. Currently, this is mitigated by allocating
a few extra bytes, but this approach has the drawback of potentially
reading unintended bytes beyond the source buffer.
This PR adopts a similar approach to vertex buffers: the backend handles
extra byte allocation while ensuring data uploads and downloads function
correctly without requiring those additional bytes.
No changes were needed for Metal, as its allocation size is already
aligned to 256 bytes.
**Alternative solutions considered**:
- Copying the CPU buffer to a larger buffer when needed (performance impact).
- Modifying OSD buffers to allocate extra space (requires changes to an external library).
- Implementing GPU_storagebuf_update_sub.
Ref #135873
Pull Request: https://projects.blender.org/blender/blender/pulls/135716
The patch strings did not have thread safe initialization.
The string might hav been returned null or incomplete
which might trigger compilation errors.
It has been confirmed that the latest release of AMD drivers has fixed
issues for both OpenGL and Vulkan. Users should use AMD driver 25.3.1
or later. Removing the workaround as it has performance penalties on
RDNA2 based GPUs.
Reference: #135516
Pull Request: https://projects.blender.org/blender/blender/pulls/135630
I've hit this a couple of times and disabling it always worked fine for me. So
it's good to make it more obvious that there is an actual bug instead of a
missed optimization.
Pull Request: https://projects.blender.org/blender/blender/pulls/135467
Add a `--profile-gpu` launch argument.
When set, it generates a profile in the Trace Event Format with CPU and
GPU metrics based on GPU debug scopes.
https://profilerpedia.markhansen.co.nz/formats/trace-event-format/
The profiles are best viewed at https://ui.perfetto.dev/
Notes:
- The profiler captures everything form app start to exit.
- Being JSON based the profiles can become relatively large, but they
compress very well.
- Only OpenGL profiling is supported for now, but the report formatting
code can be shared across backends.
Pull Request: https://projects.blender.org/blender/blender/pulls/133557
Framebuffers are getting freed in the GPUContext base class destructor. But
the framebuffer destructors use the MTL/VK/GLContext derived class, whose
destructor has already completed at this point. So these contexts are no
longer valid to use.
Now free the framebuffers earlier.
This caused ASAN warnings, it's not known to cause actual bugs.
Pull Request: https://projects.blender.org/blender/blender/pulls/132504
Ensure `gl_ViewportIndex` and `gl_Layer` are properly forwarded from the
geometry shader, and don't write to them from the vertex shader if
there's a geometry shader stage.
Fixes the Displacement "dicing" render tests on Nvidia OpenGL.
Pull Request: https://projects.blender.org/blender/blender/pulls/131875
Rendering animations from Python scripts via `bpy.ops.render.opengl()`
did not trigger any of the notifications in the Metal back-end to
indicate a frame had been rendered and that the associated resources
could be released. This adds a call to GPU_render_step() after each
render. For the original asset in the bug report this reduces the high
memory watermark from 30gb to 13gb for 500 frames. 13gb is likely
still too high and therefore it is likely there are additional leaks
that need to be addressed so this should only be considered a partial
fix.
Authored by Apple: James McCarthy
Co-authored-by: James McCarthy <jamesmccarthy@apple.com>
Co-authored-by: Clément Foucault <foucault.clem@gmail.com>
Pull Request: https://projects.blender.org/blender/blender/pulls/131085
GPUViewport is creating a bunch of framebuffer textures for itself, but
some space types never initialize/use them. E.g. Sequencer, Nodes etc.
only ever use the "overlay" texture. Eventually when viewport is
"drawn", it combines this uninitialized texture data and then only by
luck it happens that most of the time it is black. But not always!
The textures were only cleared (right now) on Metal backend, under
GPU_clear_viewport_workaround as if it was some driver workaround. Stop
doing that, and just clear them always.
However, there was seemingly a performance issue on OpenGL, when this
clear was being done. At least on my machine (Win10, Geforce RTX
3080Ti), the overhead of doing the clears is measurable, and is caused
by usage of GL4.4 glClearTexImage instead of a framebuffer clear. As if
glClearTexImage makes "pixel data to exist" on the CPU side and then
later on binding this framebuffer sends off that data to the GPU, or
somesuch.
More details in the PR.
Pull Request: https://projects.blender.org/blender/blender/pulls/131518
There seems to be an issue inside Intel OpenGL driver of legacy
platforms that fails to link `gpu_shader_sequencer_strips`.
Uniform locations are used to fix an specialization constants issue.
This PR only adds the uniform location when the shader can be
specialized. It is unclear what is actually failing inside the driver
but there are other issues with the driver.
Pull Request: https://projects.blender.org/blender/blender/pulls/131293
This happened because NVidia GPUs require higher alignment
for SSBO binds than for vertex inputs.
This is related to #131103 which fixed it for vulkan.
Add a common capability option for that.
This port is not so straightforward.
This shader is used in different configurations and is
available to python bindings. So we need to keep
compatibility with different attributes configurations.
This is why attributes are loaded per component and a
uniform sets the length of the component.
Since this shader can be used from both the imm and batch
API, we need to inject some workarounds to bind the buffers
correctly.
The end result is still less versatile than the previous
metal workaround (i.e.: more attribute fetch mode supported),
but it is also way less code.
### Limitations:
The new shader has some limitation:
- Both `color` and `pos` attributes need to be `F32`.
- Each attribute needs to be 4byte aligned.
- Fetch type needs to be `GPU_FETCH_FLOAT`.
- Primitive type needs to be `GPU_PRIM_LINES`, `GPU_PRIM_LINE_STRIP` or `GPU_PRIM_LINE_LOOP`.
- If drawing using an index buffer, it must contain no primitive restart.
Rel #127493
Co-authored-by: Jeroen Bakker <jeroen@blender.org>
Pull Request: https://projects.blender.org/blender/blender/pulls/129315
OpenGL & Vulkan has workarounds when gl_Layer/gl_ViewportIndex isn't
supported. In this case a geometry shader will is generated. This
geometry shader doesn't follow the GLSL standard and doesn't work on
some platforms. This has not been an issue as the platforms that
don't support gl_Layer/gl_ViewportIndex don't show the issue.
According to the specs gl_Layer and gl_ViewportIndex should be set for
each call to EmitVertex. A shader should not rely on that EmitVertex
reuses the same memory.
Ref https://www.khronos.org/opengl/wiki/Geometry_Shader#Layered_rendering
```
Warning: gl_Layer and gl_ViewportIndex are GS output variables. As such, every time
you call EmitVertex, their values will become undefined. Therefore, you must set
these variables every time you loop over outputs.
```
Issue detected during development of !129062
Pull Request: https://projects.blender.org/blender/blender/pulls/130506
Adding a dummy storage buffer to the classification shader
seems to fix the issue on Qualcomm drivers (WoA).
The workaround is added to the force workaround option to
allow other platforms to test the fix.
Rel #122837
Pull Request: https://projects.blender.org/blender/blender/pulls/129857
For C/C++ doc-strings should be located in headers,
move function comments into the headers, in some cases merging
with existing doc-strings, in other cases, moving implementation
notes into the function body.
Avoid measuring the length of strings repeatedly by passing their
length along with their data with `StringRefNull`. Null termination
seems to be necessary still for passing the shader sources to OpenGL.
Though I doubt this is a bottleneck, it's still nice to avoid overhead from
string operations and this helps move in that direction.
Pull Request: https://projects.blender.org/blender/blender/pulls/127702
The goal is to reduce the startup time cost of
all of these parsing and string replacement.
All comments are now stripped at compile time.
This comment check added noticeable slowdown at
startup in debug builds and during preprocessing.
Put all metadatas between start and end token.
Use very simple parsing using `StringRef` and
hash all identifiers.
Move all the complexity to the preprocessor that
massagess the metadata into a well expected input
to the runtime parser.
All identifiers are compile time hashed so that no string
comparison is made at runtime.
Speed up the source loading:
- from 10ms to 1.6ms (6.25x speedup) in release
- from 194ms to 6ms (32.3x speedup) in debug
Follow up #129009
Pull Request: https://projects.blender.org/blender/blender/pulls/128927
Move most of the string preprocessing used for MSL
compatibility to `glsl_preprocess`.
Enforce some changes like matrix constructor and
array constructor to the GLSL codebase. This is
for C++ compatibility.
Additionally reduce the amount of code duplication
inside the compatibility code.
Pull Request: https://projects.blender.org/blender/blender/pulls/128634
We can have deferred and non-deferred shaders (so, different threads)
with the same `additonal_info` dependencies trying to finalize the same
`ShaderCreateInfo`.
This ensures `finalize` always runs from the main thread to avoid race
conditions.
Pull Request: https://projects.blender.org/blender/blender/pulls/128281
This rare GPU has z-fighting issues in editor mode. Might be fixable by
changing the bias, but would decrease precision on other platforms as
well. Better to move this GPU to limited support. It is working, just
has some drawing artifacts.
See #128179
Pull Request: https://projects.blender.org/blender/blender/pulls/128351