While investigating Blender compilation time for windows-arm64, we
identified two compilation units that were taking a long time to compile
(~1h each). This affects windows-x64 builds as well.
Pull Request: https://projects.blender.org/blender/blender/pulls/117534
A draw list bundles multiple draw commands for the same geometry
and sends the draw commands in a single command. This reduces
the overhead of pipeline checking, resource validation and can
keep the load higher on the gpu as more work needs to be done.
Previously the draw list didn't bundle any commands and would still
send each call separately to the GPU. This PR implements the bundling
of the commands.
Pull Request: https://projects.blender.org/blender/blender/pulls/117548
Small change to always opt-in to using
invariant position in the vertex shader.
This ensures precision between position
calculations from different shaders which
need to produce the exact same result, by
disabling fastMath on only those instructions.
After benchmarking, the impact of this change
does not appear to affect performance bottlenecks
but will reduce the need for additional bias calculations.
Authored by Apple: Michael Parkin-White.
Pull Request: https://projects.blender.org/blender/blender/pulls/117478
When a shader performs a geometry shader injectoin to work around
features that are not supported natively on the GPU (viewport,
barycentric coordinates, layered rendering), linking would fail.
The reason was that the geometry shader was stored in a slot that was
patched by the specialization constants, resulting in an empty geometry
shader. An empty shader can be compiled, but doesn't match the interface
with other stages, so the linking would fail.
This fixes the issue that EEVEE crashed on Intel iGPUs. These GPUs
don't support viewports.
Pull Request: https://projects.blender.org/blender/blender/pulls/117440
This PR improves the place when shader stages are attached to glPrograms.
Previously it was done when shaders stages where created, in the function
create_shader_stage.
This PR will attach the shader stages inside link program.
Ensuring that create_shader_stage doesn't alter the program, which isn't
clear in its name.
Pull Request: https://projects.blender.org/blender/blender/pulls/117407
The term `PIL` stands for "platform independent library." It exists since the `Initial Revision`
commit from 2002. Nowadays, we generally just use the `BLI` (blenlib) prefix for such code
and the `PIL` prefix feels more confusing then useful. Therefore, this patch renames the
`PIL` to `BLI`.
Pull Request: https://projects.blender.org/blender/blender/pulls/117325
Ensure attachment states and load/store configs don't get out of sync
with the framebuffer layout.
In theory, a Framebuffer could have empty attachments interleaved with
valid ones so checking just the attachments "length" is not enough.
What this does instead is to ensure that valid attachments have a valid
config and that null attachments either don't have a matching config or
have an IGNORE/DONT_CARE one.
Pull Request: https://projects.blender.org/blender/blender/pulls/117073
Generated copies of GLSL sources are kept in a std::string and
it was always accessed by a long living StringRefNull which lead
to potential read from unallocated memory as std::strings are
not null terminated.
Pull Request: https://projects.blender.org/blender/blender/pulls/117120
Allows specification of per-shader threadgroup memory tuning
to optimise performance through increase of GPU occupancy.
Authored by Apple: Michael Parkin-White
Pull Request: https://projects.blender.org/blender/blender/pulls/115238
This PR adds support for specialization constants for the OpenGL
backend. The minimum OpenGL version we are targetting doesn't
have native support for specialization constants. We simulate this
by keeping track of shader programs for each set of specialization
constants that are being used.
Specialization constants can be used to reduce shader complexity
and improve performance as less registry and/or spilling is done.
This requires the ability to recompile GLShaders. In order to do this
we need to keep track of the sources that are used when the shader
was compiled. For static sources we only store references
(`GLSource::source_ref`), for dynamically generated sources we keep
a copy of the source (`GLSource::source`).
When recompiling the shader GLSL source-code is generated for the
constants stored in `Shader::constants`. When compiling the previous
GLSource that contains specialization constants is then replaced
by the new version.
Pull Request: https://projects.blender.org/blender/blender/pulls/116926
This PR cleans up shader builder CMake files and fixes vulkan includes
that could not be found on all platforms.
* Reduce code duplication
* Use private var for MANIFEST on windows
* Add system includes when compiling
Pull Request: https://projects.blender.org/blender/blender/pulls/115889
When anisotropic filtering is enabled on a sampler its value must be
between 1 and 16. In blender it is possible to set a value lower than 1.
0 actually means that anisotropic filtering is disabled in Blender.
This would trigger a validation error in Vulkan.
Pull Request: https://projects.blender.org/blender/blender/pulls/117018
The Specialization Shader workaround generated code that wasn't Vulkan
GLSL compliant. The uintBitsToFloat cannot be used during global const
initialization.
This is a temporary work around until we implement the Specialization
Constants for Vulkan.
Pull Request: https://projects.blender.org/blender/blender/pulls/116942
Even if related, they don't have the same performance
impact.
To avoid any performance hit, we replace the Diffuse
by a Subsurface Closure for legacy EEVEE and
use the subsurface closure only where needed for
EEVEE-Next leveraging the random sampling.
This increases the compatibility with cycles that
doesn't modulate the radius of the subsurface anymore.
This change is only present in EEVEE-Next.
This commit changes the principled BSDF code so that
it is easier to follow the flow of data.
For legacy EEVEE, the SSS switch is moved to a
`radius == -1` check.
This avoid the cost of creating the tiles themselves which uses a lot
texture write. This was a bottleneck on Apple GPUs.
Also the per pixel classification allows us to remove certain checks in
the deferred lighting shader making it faster.
### TODO
- [x] Add gl_FragStencilRefARB support on other backend
- [x] Add workaround for when gl_FragStencilRefARB isnt supported
Pull Request: https://projects.blender.org/blender/blender/pulls/116704
Bundling many tests in a single binary reduces build time and disk space
usage, but is less convenient for running individual tests command line
as filter flags need to be used.
This adds WITH_TESTS_SINGLE_BINARY to generate one executable file per
source file. Note that enabling this option requires a significant amount
of disk space.
Due to refactoring, the resulting ctest names are a bit different than
before. The number of tests is also a bit different depending if this
option is used, as one uses gtests discovery and the other is organized
purely by filename, which isn't always 1:1.
Co-authored-by: Sergey Sharybin <sergey@blender.org>
Pull Request: https://projects.blender.org/blender/blender/pulls/114604
Along with the 4.1 libraries upgrade, we are bumping the clang-format
version from 8-12 to 17. This affects quite a few files.
If not already the case, you may consider pointing your IDE to the
clang-format binary bundled with the Blender precompiled libraries.
Render to Render workload depedency not correctly syncing
in Metal. PR adds hard pass break where GPU_memory_barrier's
occur during render workloads to ensure non-pixel-local writes
are visible to subsequent render invocations as needed. This is
required to support full pass dependencies on a tile-based GPU
architecture.
Note that these barriers are therefore expensive, so are skipped
where dependencies are local and fragment execution order is
well-represented via either blend order or explicit
raster_order_groups.
Authored by Apple: Michael Parkin-White
Pull Request: https://projects.blender.org/blender/blender/pulls/116656