I've hit this a couple of times and disabling it always worked fine for me. So
it's good to make it more obvious that there is an actual bug instead of a
missed optimization.
Pull Request: https://projects.blender.org/blender/blender/pulls/135467
Add a `--profile-gpu` launch argument.
When set, it generates a profile in the Trace Event Format with CPU and
GPU metrics based on GPU debug scopes.
https://profilerpedia.markhansen.co.nz/formats/trace-event-format/
The profiles are best viewed at https://ui.perfetto.dev/
Notes:
- The profiler captures everything form app start to exit.
- Being JSON based the profiles can become relatively large, but they
compress very well.
- Only OpenGL profiling is supported for now, but the report formatting
code can be shared across backends.
Pull Request: https://projects.blender.org/blender/blender/pulls/133557
Framebuffers are getting freed in the GPUContext base class destructor. But
the framebuffer destructors use the MTL/VK/GLContext derived class, whose
destructor has already completed at this point. So these contexts are no
longer valid to use.
Now free the framebuffers earlier.
This caused ASAN warnings, it's not known to cause actual bugs.
Pull Request: https://projects.blender.org/blender/blender/pulls/132504
Ensure `gl_ViewportIndex` and `gl_Layer` are properly forwarded from the
geometry shader, and don't write to them from the vertex shader if
there's a geometry shader stage.
Fixes the Displacement "dicing" render tests on Nvidia OpenGL.
Pull Request: https://projects.blender.org/blender/blender/pulls/131875
Rendering animations from Python scripts via `bpy.ops.render.opengl()`
did not trigger any of the notifications in the Metal back-end to
indicate a frame had been rendered and that the associated resources
could be released. This adds a call to GPU_render_step() after each
render. For the original asset in the bug report this reduces the high
memory watermark from 30gb to 13gb for 500 frames. 13gb is likely
still too high and therefore it is likely there are additional leaks
that need to be addressed so this should only be considered a partial
fix.
Authored by Apple: James McCarthy
Co-authored-by: James McCarthy <jamesmccarthy@apple.com>
Co-authored-by: Clément Foucault <foucault.clem@gmail.com>
Pull Request: https://projects.blender.org/blender/blender/pulls/131085
GPUViewport is creating a bunch of framebuffer textures for itself, but
some space types never initialize/use them. E.g. Sequencer, Nodes etc.
only ever use the "overlay" texture. Eventually when viewport is
"drawn", it combines this uninitialized texture data and then only by
luck it happens that most of the time it is black. But not always!
The textures were only cleared (right now) on Metal backend, under
GPU_clear_viewport_workaround as if it was some driver workaround. Stop
doing that, and just clear them always.
However, there was seemingly a performance issue on OpenGL, when this
clear was being done. At least on my machine (Win10, Geforce RTX
3080Ti), the overhead of doing the clears is measurable, and is caused
by usage of GL4.4 glClearTexImage instead of a framebuffer clear. As if
glClearTexImage makes "pixel data to exist" on the CPU side and then
later on binding this framebuffer sends off that data to the GPU, or
somesuch.
More details in the PR.
Pull Request: https://projects.blender.org/blender/blender/pulls/131518
There seems to be an issue inside Intel OpenGL driver of legacy
platforms that fails to link `gpu_shader_sequencer_strips`.
Uniform locations are used to fix an specialization constants issue.
This PR only adds the uniform location when the shader can be
specialized. It is unclear what is actually failing inside the driver
but there are other issues with the driver.
Pull Request: https://projects.blender.org/blender/blender/pulls/131293
This happened because NVidia GPUs require higher alignment
for SSBO binds than for vertex inputs.
This is related to #131103 which fixed it for vulkan.
Add a common capability option for that.
This port is not so straightforward.
This shader is used in different configurations and is
available to python bindings. So we need to keep
compatibility with different attributes configurations.
This is why attributes are loaded per component and a
uniform sets the length of the component.
Since this shader can be used from both the imm and batch
API, we need to inject some workarounds to bind the buffers
correctly.
The end result is still less versatile than the previous
metal workaround (i.e.: more attribute fetch mode supported),
but it is also way less code.
### Limitations:
The new shader has some limitation:
- Both `color` and `pos` attributes need to be `F32`.
- Each attribute needs to be 4byte aligned.
- Fetch type needs to be `GPU_FETCH_FLOAT`.
- Primitive type needs to be `GPU_PRIM_LINES`, `GPU_PRIM_LINE_STRIP` or `GPU_PRIM_LINE_LOOP`.
- If drawing using an index buffer, it must contain no primitive restart.
Rel #127493
Co-authored-by: Jeroen Bakker <jeroen@blender.org>
Pull Request: https://projects.blender.org/blender/blender/pulls/129315
OpenGL & Vulkan has workarounds when gl_Layer/gl_ViewportIndex isn't
supported. In this case a geometry shader will is generated. This
geometry shader doesn't follow the GLSL standard and doesn't work on
some platforms. This has not been an issue as the platforms that
don't support gl_Layer/gl_ViewportIndex don't show the issue.
According to the specs gl_Layer and gl_ViewportIndex should be set for
each call to EmitVertex. A shader should not rely on that EmitVertex
reuses the same memory.
Ref https://www.khronos.org/opengl/wiki/Geometry_Shader#Layered_rendering
```
Warning: gl_Layer and gl_ViewportIndex are GS output variables. As such, every time
you call EmitVertex, their values will become undefined. Therefore, you must set
these variables every time you loop over outputs.
```
Issue detected during development of !129062
Pull Request: https://projects.blender.org/blender/blender/pulls/130506
Adding a dummy storage buffer to the classification shader
seems to fix the issue on Qualcomm drivers (WoA).
The workaround is added to the force workaround option to
allow other platforms to test the fix.
Rel #122837
Pull Request: https://projects.blender.org/blender/blender/pulls/129857
For C/C++ doc-strings should be located in headers,
move function comments into the headers, in some cases merging
with existing doc-strings, in other cases, moving implementation
notes into the function body.
Avoid measuring the length of strings repeatedly by passing their
length along with their data with `StringRefNull`. Null termination
seems to be necessary still for passing the shader sources to OpenGL.
Though I doubt this is a bottleneck, it's still nice to avoid overhead from
string operations and this helps move in that direction.
Pull Request: https://projects.blender.org/blender/blender/pulls/127702
The goal is to reduce the startup time cost of
all of these parsing and string replacement.
All comments are now stripped at compile time.
This comment check added noticeable slowdown at
startup in debug builds and during preprocessing.
Put all metadatas between start and end token.
Use very simple parsing using `StringRef` and
hash all identifiers.
Move all the complexity to the preprocessor that
massagess the metadata into a well expected input
to the runtime parser.
All identifiers are compile time hashed so that no string
comparison is made at runtime.
Speed up the source loading:
- from 10ms to 1.6ms (6.25x speedup) in release
- from 194ms to 6ms (32.3x speedup) in debug
Follow up #129009
Pull Request: https://projects.blender.org/blender/blender/pulls/128927
Move most of the string preprocessing used for MSL
compatibility to `glsl_preprocess`.
Enforce some changes like matrix constructor and
array constructor to the GLSL codebase. This is
for C++ compatibility.
Additionally reduce the amount of code duplication
inside the compatibility code.
Pull Request: https://projects.blender.org/blender/blender/pulls/128634
We can have deferred and non-deferred shaders (so, different threads)
with the same `additonal_info` dependencies trying to finalize the same
`ShaderCreateInfo`.
This ensures `finalize` always runs from the main thread to avoid race
conditions.
Pull Request: https://projects.blender.org/blender/blender/pulls/128281
This rare GPU has z-fighting issues in editor mode. Might be fixable by
changing the bias, but would decrease precision on other platforms as
well. Better to move this GPU to limited support. It is working, just
has some drawing artifacts.
See #128179
Pull Request: https://projects.blender.org/blender/blender/pulls/128351
This uses the path that metal was using.
This doesn't seems to create any difference in render
tests. This simplify the backend code and avoid
specific path for metal.
Idea suggested by Kevin Chuang
Pull Request: https://projects.blender.org/blender/blender/pulls/127687
Parallel shader compilation introduced `GPU_shader_cache_dir_clear_old`.
The implementation was specific to OpenGL and could not be overwritten
by other backends. This PR improves the implementation so the backend
can have its own implementation.
This is needed for upcoming changes to the Vulkan backend where we
want to use similar mechanisms to speed up shader compilation and caching.
Pull Request: https://projects.blender.org/blender/blender/pulls/127680
This works around an issue where eevee was rendering a pure black cube in certain shader configurations in the default scene (#122837). This only affects X Elite devices (8cx Gen3 is unaffected).
Pull Request: https://projects.blender.org/blender/blender/pulls/127148
This adds a new launch argument when building with
renderdoc support. It allows to trigger the capture
of a specific capture scope. This allows selective
capture of some commonly captured parts.
Pull Request: https://projects.blender.org/blender/blender/pulls/126791
Workaround for RDNA2 shadow rendering when the geometry it needs
to render is to tiny. In this case the rasterizer can skip triangles
leading to incorrect shadow.
This issue has been forwarded to AMD, but this is a temp workaround
for the current drivers. Note that this workaround adds a performance
penalty of around 50% in selected scenes.
Pull Request: https://projects.blender.org/blender/blender/pulls/126693
On Legacy AMD devices EEVEE doesn't render any geometry. During testing
we found that it was based on reading normal attribute. Further testing
it was detected that enabling the high quality normals would solve the
rendering.
This is a known issue on legacy AMD drivers. This PR updates the check
to enable the high quality normals workaround for the latest known AMD
legacy drivers (22.6.1/21.Q1.2). Both drivers still have this issue.
Pull Request: https://projects.blender.org/blender/blender/pulls/126483
No need to enable the GL_ARB_conservative_depth extension as it is
core in GLSL 4.20. Some drivers still complain that the
extension was explicitly enabled.
Detected on AMD 21.Q2.1 (27.20.21026.2006) legacy driver.
Pull Request: https://projects.blender.org/blender/blender/pulls/126223