This PR introduces parallel shader compilation for Vulkan shader
modules. This will improve shader compilation when switching to material
preview or EEVEE render preview. It also improves material compilation.
However in order to measure the differences shaderc needs to be updated.
PR has been created so we can already start with the code review. This
PR doesn't include SPIR-V caching, what will land in a separate PR as
it needs more validation.
Parallel shader compilation has been tested on AMD/NVIDIA on Linux.
Testing on other platforms is planned in the upcoming days.
**Performance**
```
AMD Ryzen™ 9 7950X × 32, 64GB Ram
Operating system: Linux-6.8.0-44-generic-x86_64-with-glibc2.39 64 Bits, X11 UI
Graphics card: Quadro RTX 6000/PCIe/SSE2 NVIDIA Corporation 4.6.0 NVIDIA 550.107.02
```
*Test*: Start blender, open barbershop_interior.blend and wait until the viewport
has fully settled.
| Backend | Test | Duration |
| ------- | ------------------------- | -------- |
| OpenGL | Coldstart/No subprocesses | 1:52 |
| OpenGL | Coldstart/8 Subprocesses | 0:54 |
| OpenGL | Warmstart/8 Subprocesses | 0:06 |
| Vulkan | Coldstart Without PR | 0:59 |
| Vulkan | Warmstart Without PR | 0:58 |
| Vulkan | Coldstart With PR | 0:33 |
| Vulkan | Warmstart With PR | 0:08 |
The difference in time (why OpenGL is faster in a warm start is that all
shaders are cached). Vulkan in this case doesn't cache anything and all
shaders are recompiled each time. Caching the shaders will be part of
a future PR. Main reason not to add it to this PR directly is that SPIR-V
cannot easily be validated and would require a sidecar to keep SPIR-V
compatible with external tools..
**NOTE**:
- This PR was extracted from #127418
- This PR requires #127564 to land and libraries to update. Linux lib
is available as attachment in this PR. It works without, but is as slow as
single threaded compilation.
Pull Request: https://projects.blender.org/blender/blender/pulls/127698
Windows/Intel and Apple drivers do not support dynamic
rendering unused attachments. Due to mistakes we made
this extension partly optional. Eg. the extension was
optional, but its settings were not.
This PR makes the extension fully optional. However
without the extension some drivers might make incorrect
assumptions. This should be solved when it is more clear
why some drivers are still crashing when using dynamic
rendering.
Pull Request: https://projects.blender.org/blender/blender/pulls/127839
Blender codebase had two ways to convert half (FP16) to float (FP32):
- BLI_math_bits.h half_to_float. Out of 64k possible half values, it converts
4096 of them incorrectly. Mostly denormals and NaNs, which is perhaps not too
relevant. But more importantly, it converts half zero to float 0.000030517578
which does not sound ideal.
- Functions in Vulkan vk_data_conversion.hh. This one converts 2046 possible
half values incorrectly.
Function to convert float (FP32) to half (FP16) was in Vulkan
vk_data_conversion.hh, and it got a bunch of possible inputs wrong. I guess it
did not do proper "round to nearest even" that CPU/GPU hardware does.
This PR:
- Adds BLI_math_half.hh with float_to_half and half_to_float functions.
- Documentation and test coverage.
- When compiling on ARM NEON, use hardware VCVT instructions.
- Removes the incorrect half_to_float from BLI_math_bits.h and replaces single
usage of it in View3D color picking to use the new function.
- Changes Vulkan FP32<->FP16 conversion code to use the new functions, to fix
correctness issues (makes eevee_next_bsdf_vulkan test pass). This makes it
faster too.
Pull Request: https://projects.blender.org/blender/blender/pulls/127708
ShaderC compiler was cached on the Vulkan backend. The compiler itself
is light-weight and doesn't require any caching. This PR removes the
cached instance from the backend.
Pull Request: https://projects.blender.org/blender/blender/pulls/127693
This uses the path that metal was using.
This doesn't seems to create any difference in render
tests. This simplify the backend code and avoid
specific path for metal.
Idea suggested by Kevin Chuang
Pull Request: https://projects.blender.org/blender/blender/pulls/127687
Parallel shader compilation introduced `GPU_shader_cache_dir_clear_old`.
The implementation was specific to OpenGL and could not be overwritten
by other backends. This PR improves the implementation so the backend
can have its own implementation.
This is needed for upcoming changes to the Vulkan backend where we
want to use similar mechanisms to speed up shader compilation and caching.
Pull Request: https://projects.blender.org/blender/blender/pulls/127680
This class allows to define capture scopes
that can be chosen during gpu work capture.
This reduces the amount of command captured
and allow for faster replay and easier
navigation inside the debug tools like Xcode
or RenderDoc.
Vulkan would crash when generating a thumbnail in case there is no
3D viewport in the active workspace. When this happens the front buffer
is downscaled as thumbnail. We didn't create a front buffer as swap
chains are handled differently.
We work around this issue in the same way as Metal does. Create a dummy
front framebuffer and share the surface texture. When the thumbnail
generator reads from the front buffer it will read the data that was
created by the back buffer.
Pull Request: https://projects.blender.org/blender/blender/pulls/127393
Edit mode selection returns incorrect indices. The reason is
that the extent of the area downloaded to the CPU for evaluation was
incorrect and read pixels could not in the place where they are
actually stored.
Was an oversight when fixing face selection. It wasn't that noticeable
there as faces typically cover a larger space.
Pull Request: https://projects.blender.org/blender/blender/pulls/127386
Make sure all printing happens inside render boundaries
since it needs to read a storage buffer which needs to
record some commands inside command buffers.
The encoding of mat3 in std430 was incorrect leading to a drawing
artifact in the direction control of sunlight in sky textures.
The error was that every 3 floats requires an additional float
as each row of the mat3 is aligned to 16 bytes.
Pull Request: https://projects.blender.org/blender/blender/pulls/127246
Renderdoc only export extensions it supports.
`VK_EXT_dynamic_rendering_unused_attachments` is not supported by
renderdoc and therefor no devices could be found to start the vulkan
backend.
In GHOST we already work around this issue by not checking this specific
extension. We should do the same in VKBackend.
Pull Request: https://projects.blender.org/blender/blender/pulls/127236
Subtexture reading is supported via GPUFramebuffer. The input
parameters was an rect, but is called an area which includes
width and height.
Due to inconsistent naming the area was assumed to be a region leading
to incorrect sub texture being downloaded or crashes due to out of
bound writes.
This fix crashes when selecting in edit mode.
Pull Request: https://projects.blender.org/blender/blender/pulls/127229
This PR allows users to select a GPU backend.
In the system tab of the user preferences the GPU backend can be selected in the `Display Graphics` panel.
It will require a restart of Blender before the changes become effective.
During startup minimum requirements are checked. Blender will switch automatically
to OpenGL when no compatible Vulkan device could be detected. A dialog will be shown
to inform the user.
The setting of the in the `Display Graphics` panel are still overridden when blender is started
using the `--gpu-backend` option. When starting blender with `--debug-gpu` the backend
detection will print to the console.
See PR for detailed information and screenshots of the UI.
Implements #126504
Pull Request: https://projects.blender.org/blender/blender/pulls/126545
GPU depth picking was not working on all GPUs. When a GPU requires a
DEPTH32F to store depths the conversion to unsigned normalized could
wrap around. Making depths of 1.0 become 0. In stead of MAX_DEPTH.
This solves depth picking issues on Intel and AMD GPUs.
This PR introduces disk cache for static pipelines. The pipelines are
stored in `<cache folder>/vk-pipeline-caches/static-shaders.bin`.
Due to limitations in some drivers we add a custom header to the
cache file to identify if the cache file was created by the same driver
for the same GPU for the same Blender.
Reading/writing the cache is skipped when running blender with
`--debug-gpu` as that would generate different shader modules. For
now that isn't a problem, but the final implementation would check
before compiling a shader if a certain key is in the pipeline cache if
that is the case the compilation step is skipped and the cached shader
module is used.
Reference: #126229
Pull Request: https://projects.blender.org/blender/blender/pulls/127110
Base pipelines are used to optimize the vulkan pipeline creation.
Instead of building each pipeline from scratch a base pipeline can
be used to share resources and faster code-paths.
We used to overwrite the base pipeline with the last created pipeline.
This PR doesn't overwrite the base pipeline after it was initially set
giving less confusion when working with base pipelines.
Pull Request: https://projects.blender.org/blender/blender/pulls/127181
This works around an issue where eevee was rendering a pure black cube in certain shader configurations in the default scene (#122837). This only affects X Elite devices (8cx Gen3 is unaffected).
Pull Request: https://projects.blender.org/blender/blender/pulls/127148
This change does some preparations before we implement persistent
caching for static shaders.
- Move ownership of pipeline cache to the pipeline pool.
- Use two pools. one is only used for static shaders
other for non static shaders.
Related to #126229
Pull Request: https://projects.blender.org/blender/blender/pulls/127100
`atan2(0, 0)` is undefined on many platforms. To ensure consistent
result across platforms, we return `0` in this case.
Note only the behavior of the shader node `Artan2` is changed here.
During shading, we might still produce `atan2(0, 0)` internally and
cause different results across platforms, but that usually happens with
single samples and is not obvious, plus checking this condition all the
time is costly. If later we find out it's indeed necessary to change all
the invocation of `atan2(0, 0)`, we could change the wrapper functions
in `metal/compat.h` and `mtl_shader_defines.msl`.
Pull Request: https://projects.blender.org/blender/blender/pulls/126951
VSE timeline, when many (hundreds/thousands) of thumbnails were visible, was
very slow to redraw. This PR makes them 3-10x faster to redraw, by stopping
doing things that are slow :) Part of #126087 thumbnail improvements task.
- No longer do mute semitransparency or corner rounding on the CPU, do it in
shader instead.
- Stop creating a separate GPU texture for each thumbnail, on every repaint,
and drawing each thumbnail as a separate draw call. Instead, put thumbnails
into a single texture atlas (using a simple shelf packing algorithm), and
draw them in batch, passing data via UBO. The atlas is still re-created every
frame, but that does not seem to be a performance issue. Thumbnails are
cropped horizontally based on how much of their parts are visible (e.g. a
narrow strip on screen), so realistically the atlas size is kinda
proportional to screen size, and ends up being just several megabytes of data
transfer between CPU -> GPU each frame.
On this Sprite Fright edit timeline view (612 visible thumbnails), time taken
to repaint the timeline window:
- Mac (M1 Max, Metal): 68.1ms -> 4.7ms
- Windows (Ryzen 5950X, RTX 3080Ti, OpenGL): 23.7ms -> 6.8ms
This also fixes a visual issue with thumbnails, where when strips are very
tall, the "rounded corners" that were poked right into the thumbnail bitmap
on the CPU were showing up due to actual bitmap being scaled up a lot.
Pull Request: https://projects.blender.org/blender/blender/pulls/126972
This adds a new launch argument when building with
renderdoc support. It allows to trigger the capture
of a specific capture scope. This allows selective
capture of some commonly captured parts.
Pull Request: https://projects.blender.org/blender/blender/pulls/126791
Solved by not supporting complex reordering at this moment. It
is currently better to focus on quality and add back performance
later. During tests I didn't detect any user noticeable performance
degradation. Could also be because we support rendering
suspending/resuming.
Fixes artifacts in EEVEE volumes and raytracing.
Pull Request: https://projects.blender.org/blender/blender/pulls/126974
- Resource pools are shared between multiple swap chains to reduce
code complexity
- Fix issue where activating a new graphical context could still leave
the previous context rendering.
- Known issue: opening files with more windows require a redraw.
Reference: #126499
Pull Request: https://projects.blender.org/blender/blender/pulls/126961
When 'really' large buffers are allocated the incorrect clamping
function was used and the actual buffer allocated was smaller than
expected.
This was detected during investigation of #126863.
Pull Request: https://projects.blender.org/blender/blender/pulls/126915
Add Metallic BSDF Node to the shader editor.
This node can primarily be used to create more realistic looking
metallic materials than the existing Glossy BSDF node.
This commit does not add any new closures to Cycles, it simply exposes
existing closures that were previous hard to access on their own.
- Exposes the F82 fresnel type that is currently used by the
metallic component of the Principled BSDF. Results should match
between the Metallic BSDF and Principled BSDF when using the same
settings.
- Exposes the Physical Conductor fresnel type that was previously
limited to custom OSL scripts. The Conductor fresnel type accepts
IOR and Extinction coefficients to define the appearance of the
material based off real life measurements.
EEVEE only supports the F82 fresnel type with internal code to convert
the the physical conductor inputs in to a colour format for F82,
which can lead to noticeable rendering differences with
some configurations.
Pull Request: https://projects.blender.org/blender/blender/pulls/114958
When orcos are not available in the geometry data it is bound with
a default buffer. This buffer was initialized with only zeros. But
orcos required to read the 0, 0, 0, 1.
Fixes `render/shader/tex_voronoi.blend` render test.
Pull Request: https://projects.blender.org/blender/blender/pulls/126838
When rendering in background there is no swap chain in play and
resources would never be collected. Resulting in out of resources and
crashes.
This PR is an initial fix what cycles resource pools every time a layer
has been rendered. It counts the render hierarchy
(GPU_rendering_begin/end) to identify if the background render is
completed and if so would cycle to the next resource pool.
Pull Request: https://projects.blender.org/blender/blender/pulls/126819
Workaround for RDNA2 shadow rendering when the geometry it needs
to render is to tiny. In this case the rasterizer can skip triangles
leading to incorrect shadow.
This issue has been forwarded to AMD, but this is a temp workaround
for the current drivers. Note that this workaround adds a performance
penalty of around 50% in selected scenes.
Pull Request: https://projects.blender.org/blender/blender/pulls/126693
Fix crash when using EEVEE irradiance baking. When reading back the
intermediate result the active rendering was not ended, resulting
in an assert as the rendergraph is cleared and assumed to be in an
initial state (not rendering).
Pull Request: https://projects.blender.org/blender/blender/pulls/126688
Metal doesn't support RGB textures and the backend converts them to RGBA textures.
During the conversion missing RGB components should be set to 0 and missing A
component should be set to 1.
In the current implementation this was not the case and A components where also set
to 0.
PR should be backported to 4.2
Pull Request: https://projects.blender.org/blender/blender/pulls/126630
Vulkan has different depth limits then opengl. We fix this by
retargetting the opengl limits into the limits that vulkan supports.
This is done in the vertex shader, or in the geometry shader. When
viewport index or layer isn't supported a geometry shader is added to
workaround these missing features. In this case the depth range was
could be done twice (vertex shader and geometry shader).
Pull Request: https://projects.blender.org/blender/blender/pulls/126634