Commit Graph

89 Commits

Author SHA1 Message Date
Miguel Pozo
bd1f4ec23c Fix: GPU: CPP shader errors in VS2019
Continuation of #131332.

Including built-in headers in VS2019 ends up including `corecrt_math.h`
as a side effect, which has many functions that overlap in name with
our stubs.
This puts the conflicting functions inside its own namespace (`glsl`)
and declares macros for them.
(Note this has the side effect of not allowing us to use those as
variable names)

This also removes the `<cassert>` and `<cstdio>` includes.

Pull Request: https://projects.blender.org/blender/blender/pulls/131386
2024-12-04 18:03:42 +01:00
Miguel Pozo
72aaaa0c24 Fix: GPU: Errors and warnings for CPP shaders in MSVC
Pull Request: https://projects.blender.org/blender/blender/pulls/131332
2024-12-04 17:33:12 +01:00
Clément Foucault
c0c816f846 GPU: GLSL compilation as C++ for workbench static shaders 2024-11-14 23:15:06 +01:00
Clément Foucault
091004f1b8 GPU: GLSL compilation as C++ for gpu static shaders
Allow compilation of shaders using C++ for linting and
IDE support.

Related #127983

Pull Request: https://projects.blender.org/blender/blender/pulls/128724
2024-11-12 18:53:34 +01:00
Clément Foucault
85f6350c0f Fix: GPU: Missing include breaking shader tests 2024-11-09 21:50:37 +01:00
Clément Foucault
0a3008172b Cleanup: GPU: Silence shader warnings 2024-11-08 00:28:08 +01:00
Omar Emara
ba5c6c8682 Compositor: Implement Chroma Matte for new CPU compositor
Reference #125968.
2024-10-25 11:25:55 +03:00
Clément Foucault
62826931b0 GPU: Move more linting and processing of GLSL to compile time
The goal is to reduce the startup time cost of
all of these parsing and string replacement.

All comments are now stripped at compile time.
This comment check added noticeable slowdown at
startup in debug builds and during preprocessing.

Put all metadatas between start and end token.
Use very simple parsing using `StringRef` and
hash all identifiers.

Move all the complexity to the preprocessor that
massagess the metadata into a well expected input
to the runtime parser.

All identifiers are compile time hashed so that no string
comparison is made at runtime.

Speed up the source loading:
- from 10ms to 1.6ms (6.25x speedup) in release
- from 194ms to 6ms (32.3x speedup) in debug

Follow up #129009

Pull Request: https://projects.blender.org/blender/blender/pulls/128927
2024-10-15 19:47:30 +02:00
Clément Foucault
9c0321ae9b Metal: Simplify MSL translation
Move most of the string preprocessing used for MSL
compatibility to `glsl_preprocess`.

Enforce some changes like matrix constructor and
array constructor to the GLSL codebase. This is
for C++ compatibility.

Additionally reduce the amount of code duplication
inside the compatibility code.

Pull Request: https://projects.blender.org/blender/blender/pulls/128634
2024-10-07 12:54:10 +02:00
Clément Foucault
42e8cbb921 GPU: Make use of the C++ stubs in some shaders 2024-10-07 12:35:47 +02:00
Clément Foucault
7e5bc58649 GPU: Change GLSL include directive
This changes the include directive to use the standard C preprocessor
`#include` directive.

The regex to applied to all glsl sources is:
`pragma BLENDER_REQUIRE\((\w+\.glsl)\)`
`include "$1"`

This allow C++ linter to parse the code and allow easier codebase
traversal.

However there is a small catch. While it does work like a standard
include directive when the code is treated as C++, it doesn't when
compiled by our shader backends. In this case, we still use our
dependency concatenation approach instead of file injection.

This means that included files will always be prepended when compiled
to GLSL and a file cannot be appended more than once.

This is why all GLSL lib file should have the `#pragma once` directive
and always be included at the start of the file.

These requirements are actually already enforced by our code-style
in practice.

On the implementation, the source needed to be mutated to comment
the `#pragma once` and `#include`. This is needed to avoid GLSL
compiler error out as this is an extension that not all vendor
supports.

Rel #127983
Pull Request: https://projects.blender.org/blender/blender/pulls/128076
2024-10-04 15:48:22 +02:00
Chris Clyne
5a27280916 EEVEE: Light & Shadow linking
This adds feature parity with Cycles regarding light and shadow liking.

Technically, this extends the GBuffer header to 32 bits, and uses
the top bits to store the object's light set membership index.
The same index is also added to `ObjectInfo` in place of padding bytes.

For shadow linking, the shadow blocker sets bitmask is stored per
tilemap. It is then used during the GPU culling phase to cull objects
that do not belong to the shadow's sets.

Co-authored-by: Clément Foucault <foucault.clem@gmail.com>
Pull Request: https://projects.blender.org/blender/blender/pulls/127514
2024-10-03 18:41:06 +02:00
Clément Foucault
c82ddedb9b Overlay-Next: Image Space
Port all Image editor overlays.

Rel #102179

Pull Request: https://projects.blender.org/blender/blender/pulls/127366
2024-09-11 18:26:34 +02:00
Weizhen Huang
77035192c9 Fix #126799: undefined behavior of shader node Arctan2 at (0, 0)
`atan2(0, 0)` is undefined on many platforms. To ensure consistent
result across platforms, we return `0` in this case.

Note only the behavior of the shader node `Artan2` is changed here.
During shading, we might still produce `atan2(0, 0)` internally and
cause different results across platforms, but that usually happens with
single samples and is not obvious, plus checking this condition all the
time is costly. If later we find out it's indeed necessary to change all
the invocation of `atan2(0, 0)`, we could change the wrapper functions
in `metal/compat.h` and `mtl_shader_defines.msl`.

Pull Request: https://projects.blender.org/blender/blender/pulls/126951
2024-09-03 11:44:59 +02:00
Aras Pranckevicius
4c8f22447f VSE: Faster timeline thumbnail drawing
VSE timeline, when many (hundreds/thousands) of thumbnails were visible, was
very slow to redraw. This PR makes them 3-10x faster to redraw, by stopping
doing things that are slow :) Part of #126087 thumbnail improvements task.

- No longer do mute semitransparency or corner rounding on the CPU, do it in
  shader instead.
- Stop creating a separate GPU texture for each thumbnail, on every repaint,
  and drawing each thumbnail as a separate draw call. Instead, put thumbnails
  into a single texture atlas (using a simple shelf packing algorithm), and
  draw them in batch, passing data via UBO. The atlas is still re-created every
  frame, but that does not seem to be a performance issue. Thumbnails are
  cropped horizontally based on how much of their parts are visible (e.g. a
  narrow strip on screen), so realistically the atlas size is kinda
  proportional to screen size, and ends up being just several megabytes of data
  transfer between CPU -> GPU each frame.

On this Sprite Fright edit timeline view (612 visible thumbnails), time taken
to repaint the timeline window:

- Mac (M1 Max, Metal): 68.1ms -> 4.7ms
- Windows (Ryzen 5950X, RTX 3080Ti, OpenGL): 23.7ms -> 6.8ms

This also fixes a visual issue with thumbnails, where when strips are very
tall, the "rounded corners" that were poked right into the thumbnail bitmap
on the CPU were showing up due to actual bitmap being scaled up a lot.

Pull Request: https://projects.blender.org/blender/blender/pulls/126972
2024-09-03 08:25:15 +02:00
Clément Foucault
25b2c5f170 BLI: Add reduce_mul 2024-08-28 09:48:17 +02:00
Clément Foucault
2c275aec87 Overlay-Next: Mesh Edit Mode
This includes the port of the edit edge shader to the new
primitive expansion API, removing split codepath and
code duplication.

Some of the shader code is duplicated for keeping the
legacy engine untouched.

Rel #102179

Pull Request: https://projects.blender.org/blender/blender/pulls/125921
2024-08-09 16:29:59 +02:00
Campbell Barton
c071030ac3 Cleanup: spelling in comments 2024-08-04 13:45:06 +10:00
Clément FOUCAULT
d712f91881 DRW: Primitive Expansion
This PR introduces the concept of primitive expansion draws.
This allows to create a drawcall that will generate N amount of new
primitive for an original primitive in a `gpu::Batch`. The intent is to
phase out the use of geometry shader for this purpose.

This adds a new `Frequency::GEOMETRY` only available for SSBOs.
The resources using this will be fed the current `gpu::Batch` VBOs
using name matching.

A dedicated slot is reserved for the index buffer, which has its own
internal  lib to decode the index buffer content.

A new attribute lib is added to ease the loading of unaligned attribute.
This should be revisited and made obsolete once more refactor
lands.

It is similar to the Metal backend SSBO vertex fetch path but it is
defined on a different level. The main difference is that this PR is
backend independant and modify the draw module instead of the GPU
module. However, it doesn't cover all possible attribute conversion
cases. This will only be added if needed.

This system is less automatic than the Metal backend one and needs
more care to make sure the data matches what the shader expects.
The Metal system will be removed once all its usage have been
converted.

This PR only shows example usage for workbench shadows. Cleanup PRs
will follow this one.

Rel #105221

Pull Request: https://projects.blender.org/blender/blender/pulls/125782
2024-08-03 11:06:17 +02:00
Clément Foucault
d2fdb22b93 GPU: Add support for in shader printf
This allows much easier debugging of shader programs.

Usage is as simple as adding `printf` calls inside shaders.
example: `printf("Formating %d\n", my_var);`

Contrary to the `drw_print`, this is not limited
to draw manager shader dispatch/draws. It is compatible
with any shader inside blender.
Most notably, this doesn't need a viewport to display.
So this can be used to debug render pipeline.

Data formating is currently limited to only `%x`, `%d`,
`%u` and `%f`. This could be easily extended if this is
really needed.

There is no type checking, so values are directly reinterpreted
as specified by the printf format.

The current approach for making this work is to bind a
storage buffer inside `GPU_shader_bind`, making it
available to any shader that needs it. The storage buffer
is downloaded back to CPU after a frame or a render
step and the content printed to the console.

This scheduling means that you cannot rely on these printfs
to detect crashes. We could add a mode to force flushing
at shader binding to avoid this limitation.

The values are written from the shaders in binary form and
only formated on the CPU. This avoid issues with manual
printing like with `drw_print`.

Pull Request: https://projects.blender.org/blender/blender/pulls/125071
2024-07-19 15:48:00 +02:00
Jeroen Bakker
14fb537eca Merge branch 'blender-v4.2-release' 2024-07-12 15:41:39 +02:00
Jeroen Bakker
8ac023da61 Fix #124530: EEVEE: Math wrap function not working
Due to incorrect check the result was always returning the min
parameter.

Found issue by comparing the implementation with cycles.
Regression introduced by 7fe7b2eed0

Pull Request: https://projects.blender.org/blender/blender/pulls/124604
2024-07-12 15:40:12 +02:00
Clément Foucault
acf7eab3b5 Merge branch 'blender-v4.2-release'
# Conflicts:
#	tests/data
2024-07-09 14:52:34 +02:00
Clément Foucault
7fe7b2eed0 Fix: EEVEE: Hardware discrepancy with math wrap function
The math render tests were not passing on the AMD hardware.
This was due to some compiler behavior not returning 1
on the `floor((a - c) / (b - c))` calculation even if
`a` and `b` were equal.
2024-07-09 14:13:32 +02:00
Omar Emara
4f51033708 Nodes: Implement Gabor noise
This patch implements a new Gabor noise node based on [1] but with the
improvements from [2] and the phasor formulation from [3].

We compare with the most popular existing implementation, that of OSL,
from the user's point of view:

  - This implementation produces C1 continuous noise as opposed to the
    non continuous OSL implementation, so it can be used for bump
    mapping and is generally smother. This is achieved by windowing the
    Gabor kernel using a Hann window.

  - The Bandwidth input of OSL was hard-coded to 1 and was replaced with
    a frequency input, which OSL hard codes to 2, since frequency is
    more natural to control. This is even more true now that that Gabor
    kernel is windowed as opposed to truncated, which means increasing
    the bandwidth will just turn the Gaussian component of the Gabor
    into a Hann window. While decreasing the bandwidth will eliminate
    the harmonic from the Gabor kernel, which is the point of Gabor
    noise.

  - OSL had three discrete modes of operation for orienting the kernel.
    Anisotropic, Isotropic, and a hybrid mode. While this implementation
    provides a continuous Anisotropy parameter which users are already
    familiar with from the Glossy BSDF node.

  - This implementation provides not just the Gabor noise value, but
    also its phase and intensity components. The Gabor noise value is
    basically sin(phase) * intensity, but the phase is arguably more
    useful since it does not suffer from the low contrast issues that
    Gabor suffers from. While the intensity is useful to hide the
    singularities in the phase.

  - This implementation converges faster that OSL's relative to the
    impulse count, so we fix the impulses count to 8 for simplicitly.

  - This implementation does not implement anisotropic filtering.

Future improvements to the node includes implementing surface noise and
filtering. As well as extending the spectral control of the noise,
either by providing specialized kernels as was done in #110802, or by
providing some more procedural control over the frequencies of the
Gabor.

References:

[1]: Lagae, Ares, et al. "Procedural noise using sparse Gabor
convolution." ACM Transactions on Graphics (TOG) 28.3 (2009): 1-10.

[2]: Tavernier, Vincent, et al. "Making gabor noise fast and
normalized." Eurographics 2019-40th Annual Conference of the European
Association for Computer Graphics. 2019.

[3]: Tricard, Thibault, et al. "Procedural phasor noise." ACM
Transactions on Graphics (TOG) 38.4 (2019): 1-13.

Pull Request: https://projects.blender.org/blender/blender/pulls/121820
2024-06-19 09:33:32 +02:00
Clément Foucault
3ed825f981 EEVEE-Next: Use RGB9_E5 encoding for storing direct light buffers
The direct lights are usually much smoother and with
higher dynamic range than indirect lighting. Using
the R11B11G10 float format exhibit color shifts and
banding even in simple setups without a way to mitigate
the issue.

Using RGB9_E5 encoding improve the quality while retaining
the storage benefit of 32bit formats. The added overhead
of the software encoding not perceptible in a full lighting
pass.

This affects direct lights and SSS convolution result.

Fix #121937

Pull Request: https://projects.blender.org/blender/blender/pulls/122515
2024-05-30 20:41:38 +02:00
Clément Foucault
a94b8ade20 GPU: Add library for handling shared exponent format in software
This allows reducing bandwidth at the cost of some instructions
for packing and decoding the texture.

Pull Request: https://projects.blender.org/blender/blender/pulls/122446
2024-05-30 19:59:18 +02:00
Clément Foucault
4624b1a9ae Cleanup: EEVEE-Next: Group BSDF functions to per BSDF type files
The goal of this is to make it easier to add more BSDF
support in the future. Avoids code fragmentation and
allows easy entry points to all algorithms using BSDFs.

Pull Request: https://projects.blender.org/blender/blender/pulls/122255
2024-05-25 23:40:12 +02:00
Clément Foucault
732d0c56df GPU: Add atan_fast 2024-05-09 12:33:13 +02:00
Hoshinova
c78c6b0bdf Fix #119797: Noise Texture Precision Issues
The Perlin noise algorithms suffer from precision issues when a coordinate
is greater than about 250000.

To fix this the Perlin noise texture is repeated every 100000 on each axis.
This causes discontinuities every 100000, however at such scales this
usually shouldn't be noticeable.

Pull Request: https://projects.blender.org/blender/blender/pulls/119884
2024-03-29 16:12:23 +01:00
Miguel Pozo
def5f86cae Fix: EEVEE-Next: Material compilation
Move pcg functions to eevee_sampling_lib.
Including gpu_shader_common libs in engine code results in double  includes.
2024-03-22 18:58:12 +01:00
Miguel Pozo
3888bdf8b2 EEVEE-Next: Fix transparent shadows convergence
Replace the hashed alpha function in shadows for a fully random one.
Add pcg functions to `gpu_shader_common_hash.glsl`
(Split from #119480)

Pull Request: https://projects.blender.org/blender/blender/pulls/119526
2024-03-20 16:05:07 +01:00
Clément Foucault
23dce15f67 EEVEE-Next: Horizon Scan: Use Spherical harmonics
This uses Spherical Harmonics to store the indirect lighting and
distant lighting visibility.

We can then reuse this information for each closure which divide
the cost of it by 2 or 3 in many cases, doing the scanning once.

The storage cost is higher than previous method, so we split the
resolution scaling to be independant of raytracing.

The spatial filtering has been split to its own pass for performance
reason. Upsampling now only uses 4 bilinearly interpolated samples
(instead of 9) using bilateral weights to avoid bleeding.

This also add a missing dot product (which soften the lighting
around corners) and fixes the blocky artifacts seen at lower
resolution.

Pull Request: https://projects.blender.org/blender/blender/pulls/118924
2024-03-19 19:16:21 +01:00
Omar Emara
51aac62006 Fix: Output of Color Ramp node is slightly off
The output of the Color Ramp node in the GPU compositor and EEVEE is
slightly off. That's because the factor is evaluated directly at the
sampler without proper half pixel offsets to account for the sampler's
linear interpolation, which this patch adds.

Pull Request: https://projects.blender.org/blender/blender/pulls/117677
2024-01-31 10:48:46 +01:00
Hans Goudey
6438d0ad1f Cleanup: Grammar in comments 2024-01-11 11:01:50 -05:00
Campbell Barton
0ba83fde1f Cleanup: spelling in comments 2024-01-08 11:24:37 +11:00
Omar Emara
dc082f432a Fix #116522: Compositor incorrectly extrapolates values
The GPU compositor incorrectly extrapolates values of RGBA curves node.
That's because the code introduces a half-pixel offset to the color
values since they will be used to sample the curve maps. Those same
values are then used for extrapolation, which shouldn't take the
half-pixel value into account.

This patch fixes that by computing sampler coordinate in a separate
step.

Pull Request: https://projects.blender.org/blender/blender/pulls/116586
2023-12-28 09:25:11 +01:00
Miguel Pozo
6c40adcc36 Fix: GPU: from_up_axis
glsl sign can return 0.
Fixes surfels display when normal.z == 0.
2023-12-13 17:47:48 +01:00
Campbell Barton
9898602e9d Cleanup: clarify #ifndef checks in trailing #endif comments 2023-12-07 10:38:54 +11:00
Clément Foucault
fe848ce3ef EEVEE-Next: Optimize GBuffer Layout and writting
This layout is more flexible and polymorphic.

While the worst case is worse (4 + 3 layers),
the common case is more optimized (2 + 2 layers).
The average written closure data is also lower
since we can compact the data for special cases
which are quite frequent.

Some adjustment had to be made in the denoise an
tile classify shaders.

Pull Request: https://projects.blender.org/blender/blender/pulls/115541
2023-12-01 14:41:13 +01:00
Campbell Barton
1eff48a838 Cleanup: spelling in code 2023-11-27 10:55:39 +11:00
Clément Foucault
f79b86553a EEVEE-Next: Add mesh volume bounds estimation
This adds correct object bounds estimation.

This works by creating an occupancy texture where one
bit represents one froxel. A geometry pre-pass fill this
occupancy texture and doesn't do any shading. Each bit
set to 0 will not be considered occupied by the object
volume and will discard the material compute shader for
this froxel.

There is 2 method of computing the occupancy map:
- Atomic XOR: For each fragment we compute the amount of
  froxels **center** in-front of it. We then convert that
  into occupancy bitmask that we apply to the occupancy
  texture using `imageAtomicXor`. This is straight forward
  and works well for any manifold geometry.
- Hit List: For each fragment we write the fragment depth
  in a list (contained in one array texture). This list
  is then processed by a fullscreen pass (see
  `eevee_occupancy_convert_frag.glsl`) that sorts and
  converts all the hits to the occupancy bits. This
  emulate Cycles behavior by considering only back-face
  hits as exit events and front-face hits as entry events.
  The result stores it to the occupancy texture using
  bit-wise `OR` operation to compose it with other non-hit
  list objects. This also decouple the hit-list evaluation
  complexity from the material evaluation shader.

## Limitations
### Fast
- Non-manifolds geometry objects are rendered incorrectly.
- Non-manifolds geometry objects will affect other objects
  in front of them.
### Accurate
- Limited to 16 hits per layer for now.
- Non-manifolds geometry objects will affect other objects
  in front of them.

Pull Request: https://projects.blender.org/blender/blender/pulls/113731
2023-10-19 19:22:14 +02:00
Campbell Barton
2e0b844b36 Cleanup: spelling in comments 2023-10-14 13:53:00 +11:00
Clément Foucault
71dfcf4558 EEVEE-Next: Remove common lib usage
Replaces all usage by the the gpu_shader_math
equivalent. This is because the old shader
library was quite tangled.

This avoids dependency hell trying to
mix libraries.

Changes are split into isolated commits until
I had to do mass changes because of inter-
dependencies.

Pull Request: https://projects.blender.org/blender/blender/pulls/113631
2023-10-13 17:59:46 +02:00
Clément Foucault
f4e584b02a BLI: Math Vector: Add reduce_min/max/add and average function
This is a straightforward port from Cycles functions.

They are written to work even with vector of size 1.

Pull Request: https://projects.blender.org/blender/blender/pulls/113678
2023-10-13 16:48:19 +02:00
Campbell Barton
e86fbcd4f0 Merge branch 'blender-v4.0-release' 2023-10-13 10:31:44 +11:00
Campbell Barton
fb58aa5900 Cleanup: typos, duplicate words 2023-10-13 10:21:06 +11:00
Clément Foucault
9d229aee19 Math: Add from_up_axis matrix creation function
This add the possibility to create a
orthogonal basis around a given unit
vector.

The name was chosen to match the naming
convention already in place and match
the other matrix construction functions.
In other places (ex: renderers), this same
function is commonly named `make_orthonormal`
or `make_basis`.

The function is not given to have a fixed
implementation and might change overtime.
That's why the test only covers the
assumptions and not the raw values.

The implementation is borrowed from
Cycles and adapted to our math API.

Pull Request: https://projects.blender.org/blender/blender/pulls/113218
2023-10-04 14:35:47 +02:00
Clément Foucault
3a4fc2c94e EEVEE-Next: Shadow Map Tracing Initial Implementation
Shadow Map Ray Tracing is a technique that ray cast against the shadow
depth buffer. The technique is described in "Soft Shadows by
Ray Tracing Multilayer Transparent Shadow Maps".
Note that we only implement the single layer approach since storing
multiple depth is prohibitively expensive.

Pull Request: https://projects.blender.org/blender/blender/pulls/111809
2023-09-26 23:42:40 +02:00
Campbell Barton
3082037743 Cleanup: spelling in comments 2023-09-03 16:15:01 +10:00