Allows basic usage of templated functions.
There is no support for templated struct.
Benefit:
- More readable than macros in shader sources.
- Compatible with C++ tools.
- More sharing possible with host C++ code.
Requirements/Limitations:
- No default arguments to template parameters.
- Must use explicit instantiation for all variant needed.
- Explicit instantiation needs to **not** use argument deduction.
- Calls to template needs to have all template argument explicit
or all implicit.
- Template overload is not supported (redefining the same template
with different template argument or function argument types).
Currently implemented as Macros inside the build-time pre-pocessor,
but that could change to copy-paste to allow better error reporting.
However, the Macros keep the shader code reduced in the final binary
and allow different file to declare different instantiation.
The implementation is done by declaring overloads for each explicit
instantiation.
If a template has arguments not present in function
arguments, then all arguments **values** are appended to the
function name. The explicit template callsite is then modified to use
`TEMPLATE_GLUE` which will call the correct function. This is
why template argument deduction is not supported in this case.
Rel #137446
Pull Request: https://projects.blender.org/blender/blender/pulls/137441
This avoid having to guards functions that are
only available in fragment shader stage.
Calling the function inside another stage is still
invalid and will yield a compile error on Metal.
The vulkan and opengl glsl patch need to be modified
per stage to allow the fragment specific function
to be defined.
This is not yet widely used, but a good example is
the change in `film_display_depth_amend`.
Rel #137261
Pull Request: https://projects.blender.org/blender/blender/pulls/138280
This makes no sense to have in the draw namespace.
Also take the opportunity for making the coordinates
a float2 and rename them to something more descriptive.
Do this only when applicable.
This allow better compile time checking in Shader C++ compilation.
Moreover, this allows to have `constexpr` in shared code between
C++ and GLSL.
After investigation the `const` keyword in GLSL has the same
semantic than C/C++.
Rel #137333 and #137446
Pull Request: https://projects.blender.org/blender/blender/pulls/137497
This unify the C++ and GLSL codebase style.
The GLSL types are still in the backend compatibility
layers to support python shaders. However, the C++
shader compilation layer doesn't have them to enforce
correct type usage.
Note that this is going to break pretty much all PRs
in flight that targets shader code.
Rel #137261
Pull Request: https://projects.blender.org/blender/blender/pulls/137369
They are actually already some literals with the `f` suffix
that are in our shader codebase and we never had problem in
the past 5 years (or even 8 years).
So I think it is safe to do and improves convergence of codestyles.
Pull Request: https://projects.blender.org/blender/blender/pulls/137352
This patch refactors GPU shaders to remove includes to the utility
gpu_shader_common_math.glsl file. This is done because it has duplicate
functions that exist in other files, and it was really created for use
in GPU material nodes.
The safe_divide and hypot functions were removed since they exist in
gpu_shader_math_base_lib.glsl.
The compatible_[mod|pow] and wrap functions were moved into
gpu_shader_math_base_lib.glsl.
The floor_to_int function was inlined since it was trivial and only used
in one place.
The quick_floor was removed because it was unused.
The euler_to_mat3 function was replaced with the from_rotation function
from gpu_shader_math_matrix_lib.glsl.
Now the file only contains some GPU material node utility functions.
Pull Request: https://projects.blender.org/blender/blender/pulls/135160
This patches removes common_math_utils includes from compositor shaders
and replaces them with math lib includes. This involves moving some
functions from that file to to the math lib files.
Pull Request: https://projects.blender.org/blender/blender/pulls/135157
When compiling shaders using GCC there are warnings about functions
being declared twice. This PR will remove those warnings as they are
false positives. The warnings exists to identify typing errors.
Pull Request: https://projects.blender.org/blender/blender/pulls/134832
Continuation of #131332.
Including built-in headers in VS2019 ends up including `corecrt_math.h`
as a side effect, which has many functions that overlap in name with
our stubs.
This puts the conflicting functions inside its own namespace (`glsl`)
and declares macros for them.
(Note this has the side effect of not allowing us to use those as
variable names)
This also removes the `<cassert>` and `<cstdio>` includes.
Pull Request: https://projects.blender.org/blender/blender/pulls/131386
The goal is to reduce the startup time cost of
all of these parsing and string replacement.
All comments are now stripped at compile time.
This comment check added noticeable slowdown at
startup in debug builds and during preprocessing.
Put all metadatas between start and end token.
Use very simple parsing using `StringRef` and
hash all identifiers.
Move all the complexity to the preprocessor that
massagess the metadata into a well expected input
to the runtime parser.
All identifiers are compile time hashed so that no string
comparison is made at runtime.
Speed up the source loading:
- from 10ms to 1.6ms (6.25x speedup) in release
- from 194ms to 6ms (32.3x speedup) in debug
Follow up #129009
Pull Request: https://projects.blender.org/blender/blender/pulls/128927
Move most of the string preprocessing used for MSL
compatibility to `glsl_preprocess`.
Enforce some changes like matrix constructor and
array constructor to the GLSL codebase. This is
for C++ compatibility.
Additionally reduce the amount of code duplication
inside the compatibility code.
Pull Request: https://projects.blender.org/blender/blender/pulls/128634
This changes the include directive to use the standard C preprocessor
`#include` directive.
The regex to applied to all glsl sources is:
`pragma BLENDER_REQUIRE\((\w+\.glsl)\)`
`include "$1"`
This allow C++ linter to parse the code and allow easier codebase
traversal.
However there is a small catch. While it does work like a standard
include directive when the code is treated as C++, it doesn't when
compiled by our shader backends. In this case, we still use our
dependency concatenation approach instead of file injection.
This means that included files will always be prepended when compiled
to GLSL and a file cannot be appended more than once.
This is why all GLSL lib file should have the `#pragma once` directive
and always be included at the start of the file.
These requirements are actually already enforced by our code-style
in practice.
On the implementation, the source needed to be mutated to comment
the `#pragma once` and `#include`. This is needed to avoid GLSL
compiler error out as this is an extension that not all vendor
supports.
Rel #127983
Pull Request: https://projects.blender.org/blender/blender/pulls/128076
This adds feature parity with Cycles regarding light and shadow liking.
Technically, this extends the GBuffer header to 32 bits, and uses
the top bits to store the object's light set membership index.
The same index is also added to `ObjectInfo` in place of padding bytes.
For shadow linking, the shadow blocker sets bitmask is stored per
tilemap. It is then used during the GPU culling phase to cull objects
that do not belong to the shadow's sets.
Co-authored-by: Clément Foucault <foucault.clem@gmail.com>
Pull Request: https://projects.blender.org/blender/blender/pulls/127514
`atan2(0, 0)` is undefined on many platforms. To ensure consistent
result across platforms, we return `0` in this case.
Note only the behavior of the shader node `Artan2` is changed here.
During shading, we might still produce `atan2(0, 0)` internally and
cause different results across platforms, but that usually happens with
single samples and is not obvious, plus checking this condition all the
time is costly. If later we find out it's indeed necessary to change all
the invocation of `atan2(0, 0)`, we could change the wrapper functions
in `metal/compat.h` and `mtl_shader_defines.msl`.
Pull Request: https://projects.blender.org/blender/blender/pulls/126951
VSE timeline, when many (hundreds/thousands) of thumbnails were visible, was
very slow to redraw. This PR makes them 3-10x faster to redraw, by stopping
doing things that are slow :) Part of #126087 thumbnail improvements task.
- No longer do mute semitransparency or corner rounding on the CPU, do it in
shader instead.
- Stop creating a separate GPU texture for each thumbnail, on every repaint,
and drawing each thumbnail as a separate draw call. Instead, put thumbnails
into a single texture atlas (using a simple shelf packing algorithm), and
draw them in batch, passing data via UBO. The atlas is still re-created every
frame, but that does not seem to be a performance issue. Thumbnails are
cropped horizontally based on how much of their parts are visible (e.g. a
narrow strip on screen), so realistically the atlas size is kinda
proportional to screen size, and ends up being just several megabytes of data
transfer between CPU -> GPU each frame.
On this Sprite Fright edit timeline view (612 visible thumbnails), time taken
to repaint the timeline window:
- Mac (M1 Max, Metal): 68.1ms -> 4.7ms
- Windows (Ryzen 5950X, RTX 3080Ti, OpenGL): 23.7ms -> 6.8ms
This also fixes a visual issue with thumbnails, where when strips are very
tall, the "rounded corners" that were poked right into the thumbnail bitmap
on the CPU were showing up due to actual bitmap being scaled up a lot.
Pull Request: https://projects.blender.org/blender/blender/pulls/126972
This includes the port of the edit edge shader to the new
primitive expansion API, removing split codepath and
code duplication.
Some of the shader code is duplicated for keeping the
legacy engine untouched.
Rel #102179
Pull Request: https://projects.blender.org/blender/blender/pulls/125921
This PR introduces the concept of primitive expansion draws.
This allows to create a drawcall that will generate N amount of new
primitive for an original primitive in a `gpu::Batch`. The intent is to
phase out the use of geometry shader for this purpose.
This adds a new `Frequency::GEOMETRY` only available for SSBOs.
The resources using this will be fed the current `gpu::Batch` VBOs
using name matching.
A dedicated slot is reserved for the index buffer, which has its own
internal lib to decode the index buffer content.
A new attribute lib is added to ease the loading of unaligned attribute.
This should be revisited and made obsolete once more refactor
lands.
It is similar to the Metal backend SSBO vertex fetch path but it is
defined on a different level. The main difference is that this PR is
backend independant and modify the draw module instead of the GPU
module. However, it doesn't cover all possible attribute conversion
cases. This will only be added if needed.
This system is less automatic than the Metal backend one and needs
more care to make sure the data matches what the shader expects.
The Metal system will be removed once all its usage have been
converted.
This PR only shows example usage for workbench shadows. Cleanup PRs
will follow this one.
Rel #105221
Pull Request: https://projects.blender.org/blender/blender/pulls/125782
This allows much easier debugging of shader programs.
Usage is as simple as adding `printf` calls inside shaders.
example: `printf("Formating %d\n", my_var);`
Contrary to the `drw_print`, this is not limited
to draw manager shader dispatch/draws. It is compatible
with any shader inside blender.
Most notably, this doesn't need a viewport to display.
So this can be used to debug render pipeline.
Data formating is currently limited to only `%x`, `%d`,
`%u` and `%f`. This could be easily extended if this is
really needed.
There is no type checking, so values are directly reinterpreted
as specified by the printf format.
The current approach for making this work is to bind a
storage buffer inside `GPU_shader_bind`, making it
available to any shader that needs it. The storage buffer
is downloaded back to CPU after a frame or a render
step and the content printed to the console.
This scheduling means that you cannot rely on these printfs
to detect crashes. We could add a mode to force flushing
at shader binding to avoid this limitation.
The values are written from the shaders in binary form and
only formated on the CPU. This avoid issues with manual
printing like with `drw_print`.
Pull Request: https://projects.blender.org/blender/blender/pulls/125071
The math render tests were not passing on the AMD hardware.
This was due to some compiler behavior not returning 1
on the `floor((a - c) / (b - c))` calculation even if
`a` and `b` were equal.
This patch implements a new Gabor noise node based on [1] but with the
improvements from [2] and the phasor formulation from [3].
We compare with the most popular existing implementation, that of OSL,
from the user's point of view:
- This implementation produces C1 continuous noise as opposed to the
non continuous OSL implementation, so it can be used for bump
mapping and is generally smother. This is achieved by windowing the
Gabor kernel using a Hann window.
- The Bandwidth input of OSL was hard-coded to 1 and was replaced with
a frequency input, which OSL hard codes to 2, since frequency is
more natural to control. This is even more true now that that Gabor
kernel is windowed as opposed to truncated, which means increasing
the bandwidth will just turn the Gaussian component of the Gabor
into a Hann window. While decreasing the bandwidth will eliminate
the harmonic from the Gabor kernel, which is the point of Gabor
noise.
- OSL had three discrete modes of operation for orienting the kernel.
Anisotropic, Isotropic, and a hybrid mode. While this implementation
provides a continuous Anisotropy parameter which users are already
familiar with from the Glossy BSDF node.
- This implementation provides not just the Gabor noise value, but
also its phase and intensity components. The Gabor noise value is
basically sin(phase) * intensity, but the phase is arguably more
useful since it does not suffer from the low contrast issues that
Gabor suffers from. While the intensity is useful to hide the
singularities in the phase.
- This implementation converges faster that OSL's relative to the
impulse count, so we fix the impulses count to 8 for simplicitly.
- This implementation does not implement anisotropic filtering.
Future improvements to the node includes implementing surface noise and
filtering. As well as extending the spectral control of the noise,
either by providing specialized kernels as was done in #110802, or by
providing some more procedural control over the frequencies of the
Gabor.
References:
[1]: Lagae, Ares, et al. "Procedural noise using sparse Gabor
convolution." ACM Transactions on Graphics (TOG) 28.3 (2009): 1-10.
[2]: Tavernier, Vincent, et al. "Making gabor noise fast and
normalized." Eurographics 2019-40th Annual Conference of the European
Association for Computer Graphics. 2019.
[3]: Tricard, Thibault, et al. "Procedural phasor noise." ACM
Transactions on Graphics (TOG) 38.4 (2019): 1-13.
Pull Request: https://projects.blender.org/blender/blender/pulls/121820
The direct lights are usually much smoother and with
higher dynamic range than indirect lighting. Using
the R11B11G10 float format exhibit color shifts and
banding even in simple setups without a way to mitigate
the issue.
Using RGB9_E5 encoding improve the quality while retaining
the storage benefit of 32bit formats. The added overhead
of the software encoding not perceptible in a full lighting
pass.
This affects direct lights and SSS convolution result.
Fix#121937
Pull Request: https://projects.blender.org/blender/blender/pulls/122515
The goal of this is to make it easier to add more BSDF
support in the future. Avoids code fragmentation and
allows easy entry points to all algorithms using BSDFs.
Pull Request: https://projects.blender.org/blender/blender/pulls/122255
The Perlin noise algorithms suffer from precision issues when a coordinate
is greater than about 250000.
To fix this the Perlin noise texture is repeated every 100000 on each axis.
This causes discontinuities every 100000, however at such scales this
usually shouldn't be noticeable.
Pull Request: https://projects.blender.org/blender/blender/pulls/119884
This uses Spherical Harmonics to store the indirect lighting and
distant lighting visibility.
We can then reuse this information for each closure which divide
the cost of it by 2 or 3 in many cases, doing the scanning once.
The storage cost is higher than previous method, so we split the
resolution scaling to be independant of raytracing.
The spatial filtering has been split to its own pass for performance
reason. Upsampling now only uses 4 bilinearly interpolated samples
(instead of 9) using bilateral weights to avoid bleeding.
This also add a missing dot product (which soften the lighting
around corners) and fixes the blocky artifacts seen at lower
resolution.
Pull Request: https://projects.blender.org/blender/blender/pulls/118924
The output of the Color Ramp node in the GPU compositor and EEVEE is
slightly off. That's because the factor is evaluated directly at the
sampler without proper half pixel offsets to account for the sampler's
linear interpolation, which this patch adds.
Pull Request: https://projects.blender.org/blender/blender/pulls/117677