Commit Graph

5037 Commits

Author SHA1 Message Date
Clément Foucault
23dce15f67 EEVEE-Next: Horizon Scan: Use Spherical harmonics
This uses Spherical Harmonics to store the indirect lighting and
distant lighting visibility.

We can then reuse this information for each closure which divide
the cost of it by 2 or 3 in many cases, doing the scanning once.

The storage cost is higher than previous method, so we split the
resolution scaling to be independant of raytracing.

The spatial filtering has been split to its own pass for performance
reason. Upsampling now only uses 4 bilinearly interpolated samples
(instead of 9) using bilateral weights to avoid bleeding.

This also add a missing dot product (which soften the lighting
around corners) and fixes the blocky artifacts seen at lower
resolution.

Pull Request: https://projects.blender.org/blender/blender/pulls/118924
2024-03-19 19:16:21 +01:00
Aras Pranckevicius
a05adbef28 BLF: optimizations and fixes to font shader
Simplifies/optimizes the "font" shader. It runs faster now too, but primarily
this is so that it loads/initializes faster.

* Instead of doing blur via individual bilinear samples (where each sample is 4
  texel fetches), do raw texel fetches of the kernel footprint and compute final
  result by shifting the kernel weights according to bilinear fraction weight.
  For 5x5 blur, this reduces number of texel fetches from 64 down to 36.
* Instead of checking "is the texel inside the glyph box? if so, then fetch it",
  first fetch it, and then set result to zero if it was outside. Simplifies the
  branching code flow in the compiled GPU shader.
* Avoid costly integer modulo/division for "unwrapping" the font texture. The
  texture width is always power of two size, so division/modulo can be replaced
  by masking and a shift. Setup uniforms to contain the needed data.

### Fixes

* The 3x3 blur was not doing a 3x3 blur, due to a copy-pasta typo (one of the
  sample offsets was repeated twice, and thus another sample offset was
  missing).
* Blur towards left/top edges of the glyphs had artifacts, because float->int
  casting in GLSL rounds towards zero, but the code actually wanted to round
  towards floor.

Image of how the blur has changed in the PR.

### First time initialization

* Windows 10, NVIDIA RTX 3080Ti, OpenGL: 274.4ms -> 51.3ms
* macOS, Apple M1 Max, Metal: 456ms -> 289ms (this is including PSO creation
  time).

### Shader performance/complexity

Performance I only measured on macOS (M1 Max), by making a BLF text that is
scaled up to cover most of screen via Python. Using Xcode Metal profiler,
drawing that text with 5x5 shadow blur: 1.5ms -> 0.3ms.

More performance analysis details in PR.

Pull Request: https://projects.blender.org/blender/blender/pulls/119653
2024-03-19 16:29:21 +01:00
Jason Fielder
661d12aef7 Fix #119195: Ensure Metal uses correct attribute conversion mode
Resolves custom attribute types for ints and booleans by ensuring
conversion mode is correct. Previously, the attribute declarations
were assumed to be linear. However, patch ensures the correct
attribute index is now fetched, ensuring the conversion mode
is correctly specified for non-linear attribute ID's.

Authored by Apple: Michael Parkin-White

Pull Request: https://projects.blender.org/blender/blender/pulls/119569
2024-03-18 13:38:09 +01:00
Jason Fielder
6768ded895 Fix #118868: Metal render pass output for EEVEE Next
Resolves render pass export for EEVEE Next on Metal.
Reads from texture views was previously utilising the
root texture rather than the view variant, resulting
in views into texture arrays being incorrectly sampled.

Authored by Apple: Michael Parkin-White

Pull Request: https://projects.blender.org/blender/blender/pulls/119563
2024-03-16 20:16:37 +01:00
Jason Fielder
6b56ed3cd3 Metal: Resolve artifact in EEVEE Next Film Cryptomatte
Cryptomatte passes would generate a feathered outline
in Metal due to missing texture fence in chained
read->modify->write->read->... patterns.

Added imageFence function to explicitly state that
imageStore's should be visible to future imageLoad's.

Authored by Apple: Michael Parkin-White

Pull Request: https://projects.blender.org/blender/blender/pulls/119163
2024-03-14 17:48:30 +01:00
Jason Fielder
ecffea86b1 Metal: Fix Storage buffer read sync affecting surfels
Authored by Apple: Michael Parkin-White

Pull Request: https://projects.blender.org/blender/blender/pulls/119093
2024-03-14 09:40:59 +01:00
Jeroen Bakker
f0f911590e EEVEE-Next: Viewport pixel size with up-sampling
EEVEE-Next performes less on integrated GPUs then discrete GPUs.
Most shaders have been analyzed, but there will always be bottlenecks
related to architectural differences.

In order to make EEVEE-Next run smooth on integrated GPUs this change
will implement viewport pixel size option similar to Cycles. The main difference
is that the samples will still be weighted and up-sampled to the final film
resolution. This makes the pixels not look squared in the viewport but will
resolve to something close to the results without up-scaling.

This improves the performance especially on integrated GPUs. The improvement
for discrete GPUs are less noticeable. See here the stats when playing
`rain_restaurant.blend` back on a RAPHAEL_MENDOCINO iGPU.

| Pixel size | Frames per second |
|------------|-------------------|
| 1x         | 0.25 FPS          |
| 2x         | 4.14 FPS          |
| 4x         | 6.90 FPS          |
| 8x         | 9.95 FPS          |

Related to: #114597
See PR for some example images.

Pull Request: https://projects.blender.org/blender/blender/pulls/118903
2024-03-13 12:00:24 +01:00
laurynas
aa3ffca8dc Fix #119247: Curves: Extra point in evaluated spline of Curves geometry
In bf17fc8d79  after extending buffer to multiple of 4 there appeared trailing
space in buffer not covered by shader's `for` loop.

Pull Request: https://projects.blender.org/blender/blender/pulls/119346
2024-03-12 15:01:10 +01:00
Jason Fielder
06ac33bdd2 Metal: Fix SSBO from VBO size assertion
Resolves assertion firing when creating an SSBO
from a VBO which is not aligned to 16 bytes.
Required to ensure API validation is satisfied.

Authored by Apple: Michael Parkin-White

Pull Request: https://projects.blender.org/blender/blender/pulls/119298
2024-03-11 08:28:14 +01:00
Prakhar-Singh-Chouhan
5d076e0e7b Vulkan: Implementing VKBackend::samplers_update()
Implemented `VKBackend::samplers_update()`. When triggered,
if the VK Device is initialized, the `device.samplers` are
freed and reinitialized.

Implements: #117019
Pull Request: https://projects.blender.org/blender/blender/pulls/119109
2024-03-11 07:57:52 +01:00
Jason Fielder
703353b5da Metal: Fix uniform upload for small types
This patch adds special cases to Shader::uniform_int routine
to allow writing of small types (1 bytes, 2 bytes) to the push
constant buffer.

This previously interpreted all incoming push constant data as
integer components only, resulting in rendering artifacts such as
bad SRGB mode selection and shader editor not rendering due
to mis-aligned overlay parameter, as the uniform assignment
would overflow consecutive small types.

Authored by Apple: Michael Parkin-White

Pull Request: https://projects.blender.org/blender/blender/pulls/119285
2024-03-10 19:36:30 +01:00
Campbell Barton
e33f5e36ac Cleanup: spacing around C-style comment blocks 2024-03-09 23:40:57 +11:00
Campbell Barton
32151abfc3 Cleanup: spelling in comments 2024-03-09 16:47:38 +11:00
Campbell Barton
b1c59a793c Cleanup: correct spelling for alignment 2024-03-09 16:43:34 +11:00
Clément Foucault
b8e726a158 GPU: Add support for small types
This implement the design of #118961.

- Add aliases in GLSL since theses types are
  not supported.
- Add detection mechanism that prevents usage
  inside shader shared code.

Check is only done in debug build to avoid slowing down
application startup.

Pull Request: https://projects.blender.org/blender/blender/pulls/119226
2024-03-08 23:28:15 +01:00
Clément Foucault
4205718dce GPU: Cleanup type aliases
This define all aliases for supported types,
document which one to use in C++ shared code,
move relevant defines to their backend file.

Rename `bool1` to `bool32_t` and cleanup
its usage as mentioned in #118961.

Rel. #118961

Pull Request: https://projects.blender.org/blender/blender/pulls/119098
2024-03-08 19:09:10 +01:00
Hans Goudey
1e1d7034ec Cleanup: Move GPU_uniform_buffer.h to C++ 2024-03-06 21:54:28 -05:00
Anthony Roberts
445fd42c61 Windows: Add ARM64 support
* Only works on machines with a Qualcomm Snapdragon 8cx Gen3 or above.
  Older generation devices are not and will not be supported due to
  some driver issues
* Requires VS2022 for building.
* Uses new MSVC preprocessor for sse2neon compatibility.
* SIMD is not enabled, waiting on conversion of blenlib to C++.

Ref #119126

Pull Request: https://projects.blender.org/blender/blender/pulls/117036
2024-03-06 16:14:34 +01:00
Omar Emara
eb91828aab GPU: Add maximum image units to GPU capabilities
This patch adds the maximum number of supported image units to the GPU
capabilities module. Currently, the GPU module assume a maximum of 8
units, so the patch is not currently particularly useful, but we can
consider committing it for the future anyways.

Pull Request: https://projects.blender.org/blender/blender/pulls/119057
2024-03-05 07:25:20 +01:00
Campbell Barton
ed5fb3eaba Cleanup: various non-functional C++ changes 2024-03-05 11:32:42 +11:00
Campbell Barton
76867ad4c2 Cleanup: redundant "void" in function declarations for C++ 2024-03-05 11:25:35 +11:00
laurynas
bf17fc8d79 Fix: GPU: Ensures length of curves GPUIndexBuf to be multiple of 4
Exception is thrown in gpu_storage_buffer.cc

To reproduce create legacy Bezier curve and convert it to new Curves.
Code is from #116617

Pull Request: https://projects.blender.org/blender/blender/pulls/118951
2024-03-03 16:39:11 +01:00
Sergey Sharybin
3fcd7ccbc0 Compositor: Enable lock-free GPU context activation on macOS
This required to apply a small fix in the Metal texture uploader.

Without synchronization pixels of a wrong pass can be uploaded to
a wrong texture. This is because this code path is heavily reusing
temporary allocations, and at some point the allocation is not
considered as still in use, unless the command buffer used by the
texture uploader is submitted.

Ref #118919

Pull Request: https://projects.blender.org/blender/blender/pulls/118920
2024-03-01 14:38:09 +01:00
Jeroen Bakker
a012aeafd5 Revert "Fix: GPU: Reduce GPU_MAX_ATTR from 15 to 14"
This reverts commit d9caa19ec2.

This commit doesn't compile, and when fixing the issues, doesn't start
blender.
2024-02-27 13:44:41 +01:00
Pratik Borhade
c926b65132 Merge branch 'blender-v4.1-release' 2024-02-27 17:43:50 +05:30
dupoxy
d9caa19ec2 Fix: GPU: Reduce GPU_MAX_ATTR from 15 to 14
This is to accommodate Position and Normal attributes.
The normal used to be optional but isn't nowadays.

So the limit is actually 14 attributes until we do some big refactoring of
the attribute fetching.

Pull Request: https://projects.blender.org/blender/blender/pulls/118441
2024-02-27 12:19:09 +01:00
Miguel Pozo
c713fbc2d3 GPU: Allow printing full shader source on compilation error
Add a define (DEBUG_LOG_SHADER_SRC_ON_ERROR ) in gpu_shader_private.h
to print the full source code of shaders that fail to compile.

Pull Request: https://projects.blender.org/blender/blender/pulls/116470
2024-02-26 17:30:15 +01:00
Jeroen Bakker
8dce2a422b EEVEE-Next: Specialization Constants for Film Accumulation
On lower end hardware the film accumulation has bad performance. Sometimes
upto 10ms. This PR improves the performance somewhat by adding a
specialization constant around the renderpasses that are actually needed for
rendering, the number of samples and if reprojection is enabled.

`enabled_categories`: Based on the enabled render passes some outer loops are
enabled/disabled that handle the specific render passes. This improves the performance
as no memory will be reserved for branches that are never accessed.

`samples_len` & `use_reprojection`: GPU compilers tend to optimize texture fetches
when they to the outer loop. This is only possible when the inner loop can be unrolled.
In the case of the film accumulation the inner loop couldn't be unrolled. By adding a
specialization constant would allow unrolling of the inner loop.

On old or low-end devices the improvement is around 40%. On newer devices
the improvement is 50+%. Performance of this shader is similar to
the godot.

| GPU                  | Before | New   |
|----------------------|--------|-------|
| NVIDIA GTX 760       | 3.5ms  | 2.4ms |
| GFX1036 (RDNA2 iGPU) | 9.9ms  | 6.2ms |
| AMD Radeon Pro W7500 | 2.1ms  | 0.9ms |

Pull Request: https://projects.blender.org/blender/blender/pulls/118385
2024-02-26 16:19:26 +01:00
Jeroen Bakker
3109564825 GPU: Fix shader compilation Metal
Metal uses an union to store the `gl_WorkGroupSize` the union needs to
be unpacked. We first unpack to uvec3 before in order to work around an
NVIDIA driver bug.

Issue introduced by: e3ac2ac93e

Pull Request: https://projects.blender.org/blender/blender/pulls/118749
2024-02-26 14:51:21 +01:00
Jeroen Bakker
e3ac2ac93e GPU: Shaders fail to compile on NVIDIA
NVIDIA fails with segmentation fault when compiling shaders due to recent changes.
This PR tweaks the shader code to work around the segmentation fault.

Issue introduced by: 7f43699ebf

Pull Request: https://projects.blender.org/blender/blender/pulls/118744
2024-02-26 13:01:10 +01:00
Eugene Kuznetsov
7f43699ebf DRW: Curves: Indexbuf optimization for large numbers of curves
This optimizes a few loops that become significant bottlenecks during
viewport rendering of scenes with large numbers of curves.

To render a curves object, Blender needs to generate a potentially
very large (but trivial) index buffer. As previously implemented,
this index buffer is generated in an extremely inefficient manner,
with a single-threaded loop and an explicit function call per entry.
The buffer then needs to be pushed onto the GPU, which is also a fairly
slow task.

The PR generates the index buffer directly on the GPU with compute
shader.

Pull Request: https://projects.blender.org/blender/blender/pulls/116617
2024-02-25 17:22:58 +01:00
Clément Foucault
06d3627c43 EEVEE-Next: Make closure evaluation fully type agnostic
The goal of this task is to remove noise in the most common material
layering configuration.

Subsequently, this also split the evaluation of different closure to
their own buffer to avoid discontinuity when denoising them.

This commit does a few things:
- [x] Removes use of global for closure random number.
- [x] Refactor the forward evaluation to be closure type agnostic.
- [x] Refactor the gbuffer lib to be closure type agnostic.
- [x] Reduces the number of picked closure to 3 maximum or less.
- [x] Use GPU_MATFLAG_COAT to tag the use of multiple usage of glossy BSDF.
- [x] Use two closure bin for Glossy when more than one.
- [x] Set closure bin per type for best noise level for most materials.
- [x] Change the gbuffer header to put the closure at their bin index.
- [x] Add a method to get a closure from the gbuffer from a specific bin.
- [x] Split lighting passes per Closure.

Pull Request: https://projects.blender.org/blender/blender/pulls/118079
2024-02-24 00:00:11 +01:00
Jeroen Bakker
5698fb2049 RenderDoc: Set Capture Title
Adds an option to set the capture title when using renderdoc
`GPU_debug_capture_begin` has an optional `title` parameter to set
the title of the renderdoc capture.

Pull Request: https://projects.blender.org/blender/blender/pulls/118649
2024-02-23 10:57:37 +01:00
Jeroen Bakker
e70e9e3cf9 GPU: Report on vertex attribute conversions
Blender uses some vertex attributes that are not (and sometimes
never) supported by a GPU. OpenGL silently converted these changes
but for Metal/Vulkan we need to convert then when uploading the
data.

This PR will write to console invalid usages which we should remove
from Blender code-base. Note it is still possible to create attributes
that still need conversions by using the PyGPU API.
2024-02-22 11:13:16 +01:00
Jason Fielder
eac8c381a0 Metal: Fix EEVEE sync issue on render
When a sync primitive signal existed
in its own command buffer, the command
buffer execution was skipped as the empty
flag was previously still set to true.

Authored by Apple: Michael Parkin-White

Pull Request: https://projects.blender.org/blender/blender/pulls/118557
2024-02-21 12:53:42 +01:00
Jeroen Bakker
e6eecdf614 EEVEE-Next: Voronoi colors are pure emissive
The voronoi texture node only sets the first 3 components of the
color. The alpha value is never set. Normally this is covered
when using it in a shader node, but when directly connected to
the AOV output, the color was stored as a pure emissive color.

This resulted in incorrect colors in the viewport and image renders.

This is a partial fix for #118494

Pull Request: https://projects.blender.org/blender/blender/pulls/118497
2024-02-21 11:32:29 +01:00
Campbell Barton
b6b00b61cb Cleanup: various non-functional changes for C++ 2024-02-21 10:33:56 +11:00
Jason Fielder
d1a9c2f650 Fix: Metal: EEVEE Next viewport motion blur
Resolves assertion for EEVEE Next motion
blur wherein a swizzled texture used in an
image binding loses write-access. We
instead must bind the source texture
for image write operations.

This is now consistent with expected
behaviour in other APIs.

Authored by Apple: Michael Parkin-White

Pull Request: https://projects.blender.org/blender/blender/pulls/117479
2024-02-20 11:17:12 +01:00
Jeroen Bakker
df2b5630d8 Vulkan: Update PCI ids
This change cleans up the PCI ids in the Vulkan backend.

- reuses already exising constants.
- add PCI-id for Apple devices.

Pull Request: https://projects.blender.org/blender/blender/pulls/118485
2024-02-20 10:44:11 +01:00
Jeroen Bakker
98bc3369f8 Cleanup: Silence unused parameter warning in Vulkan backend 2024-02-20 10:02:11 +01:00
Jeroen Bakker
5294381dae GPU: Fix compilation issues in shader builder 2024-02-20 08:07:02 +01:00
Brecht Van Lommel
0f2064bc3b Revert changes from main commits that were merged into blender-v4.1-release
The last good commit was 4bf6a2e564.
2024-02-19 15:59:59 +01:00
Iliya Katueshenock
9e12a675b5 Cleanup: Merge BKE_node.h into BKE_node.hh
Trivial change, just move all the code from `BKE_node.h` to `BKE_node.hh` header top.
No mixing code from different headers or namespace changes. Part of #117773

Pull Request: https://projects.blender.org/blender/blender/pulls/118407
2024-02-19 15:26:10 +01:00
Jeroen Bakker
2cb2d3944b RenderTest: Fix EEVEE Render Test
Panorama dicing test fails for EEVEE on legacy platforms. EEVEE creates a shader interface
that isn't compatible with the vulkan backend. This PR hides the check.

Check should be enabled again after EEVEE has been replaced by EEVEE-Next.
This PR also changes the behavior when checks are executed. It used to be
executed when blender was build with asserts. Now it is behind the --debug-gpu flag.

Pull Request: https://projects.blender.org/blender/blender/pulls/117992
2024-02-19 08:07:53 +01:00
Clément Foucault
8b6d145a6b Fix: Metal: Shader compilation logging with compute shader
It was missing the line directive treatment the
fragment and vertex shader had.
2024-02-16 22:52:59 +01:00
Brecht Van Lommel
7453c5ed67 Merge branch 'blender-v4.1-release' into main 2024-02-16 19:31:31 +01:00
Raul Fernandez
324ff4ddef macOS: Remove unnecessary checks now that minimum version is macOS 11.2
MacOS minimum version is now 11.2 we no longer need to check for lower API versions.

Pull Request: https://projects.blender.org/blender/blender/pulls/118388
2024-02-16 19:03:23 +01:00
Jeroen Bakker
c790e6e49d OpenGL: Reduce Shader Switches
Specialization constants was always switching shader even when the
constants were not changed. An early exit path was never taken.

The performance improvement should not be noticable to end users.
But would match with the intention of the design of specialization
constants.

Pull Request: https://projects.blender.org/blender/blender/pulls/118315
2024-02-16 12:26:10 +01:00
Campbell Barton
7582b15c4c Cleanup: spelling in comments 2024-02-16 14:26:46 +11:00
Campbell Barton
55adfdc7af Merge branch 'blender-v4.1-release' 2024-02-15 21:22:52 +11:00