With 5.0 we start requiring this extension for GL and VK.
All of our target hardware supports it with up to date
drivers.
Some old drivers were disabling this extension because of
buggy behavior. We simply drop support for them in 5.0.
This allows us to remove a lot of code and the last
shader create info override done at startup. This will
unlock more refactoring of the shader create info into
static classes to reduce binary size and other benefits.
## TODO:
- [x] Remove checks for ARB_shader_draw_parameters
- [x] Remove checks for ARB_clip_control
- [x] Check for the extension on startup for OpenGL
- [x] Check for the extension on startup for Vulkan
- [x] ~~Add user facing popup message about minimum
requirements not being met.~~ Done using the same
popup as old hardware.
Pull Request: https://projects.blender.org/blender/blender/pulls/142334
Vulkan 1.0 render passes have been replaced by dynamic rendering in 1.2.
Blender Vulkan backend was implemented with dynamic rendering in mind.
All our supported platforms support dynamic rendering. Render pass support
was added to try to work around an issue with legacy drivers. However these
drivers also fail with render passes.
Using render passes had several limitations (blending and some workbench
features were not supported). As no GPU uses it and it is quite some code
to support it is better to remove it.
Pull Request: https://projects.blender.org/blender/blender/pulls/144149
On some drivers, the GLSL compiler doesn't reflect the omitted
`local_size_*` of a compute shader inside `gl_WorkGroupSize`.
This lead to the 2D size computation of 1D workgroups to become
0 which was bypassing the parallel reduction algorithms.
Ensuring `local_size_*` are always set fixes the issue.
For clarity, also fix the 1D shaders to not use `gl_WorkGroupSize.y`.
This also fix a copy paste error in the Metal backend.
This issue affected AMD drivers on Windows.
Rel #142046
Candidate for backporting to 4.5 LTS.
Pull Request: https://projects.blender.org/blender/blender/pulls/144056
The Map UV node does not work when the UV input is a single value, where
it is expected that the output will also be single value. This was
simply not implemented for GPU, so this patch does that.
Pull Request: https://projects.blender.org/blender/blender/pulls/143096
When creating `GPUSecondaryContext`s, `epoxy_gl_version` returns 0 and
`epoxy_has_gl_extension` always returns false.
This is caused by `GPU_context_create` being called without the ghost
context being activated.
Activating it fixes the issue.
Pull Request: https://projects.blender.org/blender/blender/pulls/142715
Shader compilation no longer uses the `WM_job` API.
Add a `GPU_shader_batch_is_compiling` function to query if there's any
shader compilation happening, and update `bpy_app_is_job_running` to
handle this as a special case.
Pull Request: https://projects.blender.org/blender/blender/pulls/143559
Recursive downsample was only used by workbench DoF
which can be expressed using mip view instead.
The mip render workaround was creating GL error on startup
and is not needed anymore.
Pull Request: https://projects.blender.org/blender/blender/pulls/143246
This allows to store a number of vertex to draw
per batch without specifying any attribute.
This allows to create batches that are empty but
still holds the amount of geometry to produce.
Needed for new curve drawing #142969.
Pull Request: https://projects.blender.org/blender/blender/pulls/143052
There is no need to initialize index buffers with zero since such
buffers always have to be filled by the caller. This change replaces
the allocation with malloc, so that GPU_indexbuf_init results in an
uninitialized buffer. In debug, and with asan, the buffer will be still
filled by something, but the caller should initialize zero indices
manually instead of relying on a default value.
For example, sometimes the cost of zeroing on allocation is similar to
the cost of filling the buffer with actual data. For a point cloud with
1'000'000 points, octahedron topology update on each frame of simulation
takes:
| | Main | PR |
| -------------------------- | ------- | ------- |
| GPU_indexbuf_init | 2.75 ms | 5233 ns |
| pointcloud_extract_indices | 6.95 ms | 4.64 ms |
Pull Request: https://projects.blender.org/blender/blender/pulls/141110
In `pygpu_shader_attrs_info_get`, it tries to check information for all
vertex attributes that are added via `VERTEX_IN`, however some drivers
will optimize compiled shaders so some vertex attributes that are not
used will be removed. This fix makes sure that the input length that
is used in `GPU_shader_get_attribute_len` does not exceed actual max
binding number.
Pull Request: https://projects.blender.org/blender/blender/pulls/137584
The numeric levels have no obvious meaning. This removes the distinction
between severity and levels, instead there is a single list of named levels
with defined meaning.
Debug means information that's mainly useful for developers, and trace is for
very verbose code execution tracing.
Pull Request: https://projects.blender.org/blender/blender/pulls/140244
Reading from the top-right of the selection buffer could read
past the buffer bounds. Resolve by ensuring the clamped buffer
isn't empty. Relates to #141591.
The issue was that this operator is using the same
path as the viewport display. So the colors are
clamped if the display is not HDR.
This fix is easy now that we have an HDR path for
the viewport display. We just enforce using it
when doing the viewport render preview.
GPU_DATA_UINT_24_8 isn't used anymore. We cannot phase out the data type
as it can still be used by add-ons. This PR will deprecate
`GPU_DATA_UINT_24_8`. When used in an add-on a deprecation message will
be shown.
Pull Request: https://projects.blender.org/blender/blender/pulls/140715
Metal and AMD/Intel/Vulkan don't support depth24 texture formats
natively. The backends implemented fallback to use depth32f in
stead.
Recently we removed all usages of depth24 to use depth32 and the
next step is to remove the depth24 format and the workarounds in
the backend.
Note: The removal of `GPU_DATA_UINT_24_8` isn't part of this PR.
Pull Request: https://projects.blender.org/blender/blender/pulls/140659
Blender uses depth24 for legacy reasons. All backends that we support
have support for depth32f.
This PR updates all usages of depth24 with depth32f.
- depth24 are not supported on AMD/Intel/Vulkan and Metal. There depth32f
was already used to work around this limitation.
- This allows us to implement reverse depth in workbench, overlay and
grease pencil in the future.
Pull Request: https://projects.blender.org/blender/blender/pulls/140531
The polyline workarounds were not working as expected
since #139627 as it was not garanteed that the polylines
shader would be correctly initialized with the workaround
tag.
Adding a wrapper class to ensure the initialization fixes
the issue.
This reduces the time needed to get to the first pixel
on screen by multithreading the builtin shader compilation.
We avoid doing this if subprocess compilation is on as
the overhead of potentially partially starting all subprocess
is far greater than the benefit of paralllel compilation.
For some reasons, the compilation is much slower when
done async for these shaders (on Metal ~200ms > ~1.2ms),
so the saving might not be substantial.
Mac M1: First frame 6s > 5s.
Pull Request: https://projects.blender.org/blender/blender/pulls/139627
The Extend Bounds input has no effect when the Fast Gaussian filter is
used. Similarly, it has no effect if the Bokeh Blur node is using
variable size. This is a known limitation and was just not implemented.
So to fix this, we implement a general solution that works globally
across the node by pre-padding the inputs of the blur. This uses more
memory but also speeds up the base case when Extend Bounds is disabled,
while also reducing the binary size due to fewer blur specializations.
The variable size Bokeh Blur test was updated since it Extend Bounds was
silently ignored.
Pull Request: https://projects.blender.org/blender/blender/pulls/140192
Prevent race conditions caused by calling `GPUWorker::wake_up` when the
worker is not waiting.
Found to be an issue in #139627, since `wake_up` is likely to be called
before the thread has fully started.
Pull Request: https://projects.blender.org/blender/blender/pulls/139842
This allows to generate source file that will
be injected in a predefined source dependance tree.
This allow much cleaner shader workflow where
all sources are explicitly referenced from the
main source file.
Pull Request: https://projects.blender.org/blender/blender/pulls/140047
This prevents the use of unaligned data types in
vertex formats. These formats are not supported on many
platform.
This simplify the `GPUVertexFormat` class a lot as
we do not need packing shenanigans anymore and just
compute the vertex stride.
The old enums are kept for progressive porting of the
backends and user code.
This will break compatibility with python addons.
TODO:
- [x] Deprecation warning for PyGPU (4.5)
- [x] Deprecate matrix attributes
- [x] Error handling for PyGPU (5.0)
- [x] Backends
- [x] Metal
- [x] OpenGL
- [x] Vulkan
Pull Request: https://projects.blender.org/blender/blender/pulls/138846
This works by wrapping the entry point call inside a
`main` function.
Since resources are still defined in global space,
function accessing these are marked with a custom
attribute. This custom attribute expands in a
`#ifdef` guard for the matching stage.
This is a temporary solution and will eventually
be lifted once we support SRD.
### TODO
- [ ] Implement `[[gpu::vertex/fragment_function]]`.
Pull Request: https://projects.blender.org/blender/blender/pulls/139233
This allows to reduce the waiting time caused by
shader compilation on some GPU-driver combo.
A new settings in the User Preferences make it
possible to override the default amount of worker
threads and optionally use subprocesses.
We still use only one worker thread in cases where
there is no benefit with adding more workers
(like AMD pro driver and Intel windows).
It doesn't scale as much as subprocesses for material
shader compilation but that is for other reasons
explained in #139818.
Add some heuristic to avoid too much memory usage
and / or too many stalls.
Also add some heuristic to the default number of subprocess for
the platform that shows scalling.
Historically, multithreaded compilation was prevented by the
need of context per thread inside `DRWShader` module.
Also there was no good scaling at that time. But
nowadays numbers shows different results with
good scaling with reasonable amount of threads on many
platforms.
Even if we are going for vulkan in the next release
most of the legacy hardware will still use OpenGL for
a few other releases. So it is relevant to make this
easy improvement.
See pull request for measurements.
Pull Request: https://projects.blender.org/blender/blender/pulls/139821
This has limited use cases since it doesn't
profile the heavy part of the vulkan backend.
Almost 1:1 port of the metal implementation from #139551.
Doesn't cover rendergraph submission nor GPU timings.
Pull Request: https://projects.blender.org/blender/blender/pulls/139899
When drawing image scope the vulkan backend raised some asserts. After
checking them the issue was in the calling code, that could lead to
undefined behavior on other platforms as well.
It isn't allowed to have an immediate mode shader bound, when performing
batch drawing. There was also a point batch that didn't use any point
shader resulting in undefined behavior as well.
For 5.0 we should add this as a GPU module check.
Pull Request: https://projects.blender.org/blender/blender/pulls/139926