Continuation of #131332.
Including built-in headers in VS2019 ends up including `corecrt_math.h`
as a side effect, which has many functions that overlap in name with
our stubs.
This puts the conflicting functions inside its own namespace (`glsl`)
and declares macros for them.
(Note this has the side effect of not allowing us to use those as
variable names)
This also removes the `<cassert>` and `<cstdio>` includes.
Pull Request: https://projects.blender.org/blender/blender/pulls/131386
The goal is to reduce the startup time cost of
all of these parsing and string replacement.
All comments are now stripped at compile time.
This comment check added noticeable slowdown at
startup in debug builds and during preprocessing.
Put all metadatas between start and end token.
Use very simple parsing using `StringRef` and
hash all identifiers.
Move all the complexity to the preprocessor that
massagess the metadata into a well expected input
to the runtime parser.
All identifiers are compile time hashed so that no string
comparison is made at runtime.
Speed up the source loading:
- from 10ms to 1.6ms (6.25x speedup) in release
- from 194ms to 6ms (32.3x speedup) in debug
Follow up #129009
Pull Request: https://projects.blender.org/blender/blender/pulls/128927
Move most of the string preprocessing used for MSL
compatibility to `glsl_preprocess`.
Enforce some changes like matrix constructor and
array constructor to the GLSL codebase. This is
for C++ compatibility.
Additionally reduce the amount of code duplication
inside the compatibility code.
Pull Request: https://projects.blender.org/blender/blender/pulls/128634
This changes the include directive to use the standard C preprocessor
`#include` directive.
The regex to applied to all glsl sources is:
`pragma BLENDER_REQUIRE\((\w+\.glsl)\)`
`include "$1"`
This allow C++ linter to parse the code and allow easier codebase
traversal.
However there is a small catch. While it does work like a standard
include directive when the code is treated as C++, it doesn't when
compiled by our shader backends. In this case, we still use our
dependency concatenation approach instead of file injection.
This means that included files will always be prepended when compiled
to GLSL and a file cannot be appended more than once.
This is why all GLSL lib file should have the `#pragma once` directive
and always be included at the start of the file.
These requirements are actually already enforced by our code-style
in practice.
On the implementation, the source needed to be mutated to comment
the `#pragma once` and `#include`. This is needed to avoid GLSL
compiler error out as this is an extension that not all vendor
supports.
Rel #127983
Pull Request: https://projects.blender.org/blender/blender/pulls/128076
This adds feature parity with Cycles regarding light and shadow liking.
Technically, this extends the GBuffer header to 32 bits, and uses
the top bits to store the object's light set membership index.
The same index is also added to `ObjectInfo` in place of padding bytes.
For shadow linking, the shadow blocker sets bitmask is stored per
tilemap. It is then used during the GPU culling phase to cull objects
that do not belong to the shadow's sets.
Co-authored-by: Clément Foucault <foucault.clem@gmail.com>
Pull Request: https://projects.blender.org/blender/blender/pulls/127514
`atan2(0, 0)` is undefined on many platforms. To ensure consistent
result across platforms, we return `0` in this case.
Note only the behavior of the shader node `Artan2` is changed here.
During shading, we might still produce `atan2(0, 0)` internally and
cause different results across platforms, but that usually happens with
single samples and is not obvious, plus checking this condition all the
time is costly. If later we find out it's indeed necessary to change all
the invocation of `atan2(0, 0)`, we could change the wrapper functions
in `metal/compat.h` and `mtl_shader_defines.msl`.
Pull Request: https://projects.blender.org/blender/blender/pulls/126951
VSE timeline, when many (hundreds/thousands) of thumbnails were visible, was
very slow to redraw. This PR makes them 3-10x faster to redraw, by stopping
doing things that are slow :) Part of #126087 thumbnail improvements task.
- No longer do mute semitransparency or corner rounding on the CPU, do it in
shader instead.
- Stop creating a separate GPU texture for each thumbnail, on every repaint,
and drawing each thumbnail as a separate draw call. Instead, put thumbnails
into a single texture atlas (using a simple shelf packing algorithm), and
draw them in batch, passing data via UBO. The atlas is still re-created every
frame, but that does not seem to be a performance issue. Thumbnails are
cropped horizontally based on how much of their parts are visible (e.g. a
narrow strip on screen), so realistically the atlas size is kinda
proportional to screen size, and ends up being just several megabytes of data
transfer between CPU -> GPU each frame.
On this Sprite Fright edit timeline view (612 visible thumbnails), time taken
to repaint the timeline window:
- Mac (M1 Max, Metal): 68.1ms -> 4.7ms
- Windows (Ryzen 5950X, RTX 3080Ti, OpenGL): 23.7ms -> 6.8ms
This also fixes a visual issue with thumbnails, where when strips are very
tall, the "rounded corners" that were poked right into the thumbnail bitmap
on the CPU were showing up due to actual bitmap being scaled up a lot.
Pull Request: https://projects.blender.org/blender/blender/pulls/126972
This includes the port of the edit edge shader to the new
primitive expansion API, removing split codepath and
code duplication.
Some of the shader code is duplicated for keeping the
legacy engine untouched.
Rel #102179
Pull Request: https://projects.blender.org/blender/blender/pulls/125921
This PR introduces the concept of primitive expansion draws.
This allows to create a drawcall that will generate N amount of new
primitive for an original primitive in a `gpu::Batch`. The intent is to
phase out the use of geometry shader for this purpose.
This adds a new `Frequency::GEOMETRY` only available for SSBOs.
The resources using this will be fed the current `gpu::Batch` VBOs
using name matching.
A dedicated slot is reserved for the index buffer, which has its own
internal lib to decode the index buffer content.
A new attribute lib is added to ease the loading of unaligned attribute.
This should be revisited and made obsolete once more refactor
lands.
It is similar to the Metal backend SSBO vertex fetch path but it is
defined on a different level. The main difference is that this PR is
backend independant and modify the draw module instead of the GPU
module. However, it doesn't cover all possible attribute conversion
cases. This will only be added if needed.
This system is less automatic than the Metal backend one and needs
more care to make sure the data matches what the shader expects.
The Metal system will be removed once all its usage have been
converted.
This PR only shows example usage for workbench shadows. Cleanup PRs
will follow this one.
Rel #105221
Pull Request: https://projects.blender.org/blender/blender/pulls/125782
This allows much easier debugging of shader programs.
Usage is as simple as adding `printf` calls inside shaders.
example: `printf("Formating %d\n", my_var);`
Contrary to the `drw_print`, this is not limited
to draw manager shader dispatch/draws. It is compatible
with any shader inside blender.
Most notably, this doesn't need a viewport to display.
So this can be used to debug render pipeline.
Data formating is currently limited to only `%x`, `%d`,
`%u` and `%f`. This could be easily extended if this is
really needed.
There is no type checking, so values are directly reinterpreted
as specified by the printf format.
The current approach for making this work is to bind a
storage buffer inside `GPU_shader_bind`, making it
available to any shader that needs it. The storage buffer
is downloaded back to CPU after a frame or a render
step and the content printed to the console.
This scheduling means that you cannot rely on these printfs
to detect crashes. We could add a mode to force flushing
at shader binding to avoid this limitation.
The values are written from the shaders in binary form and
only formated on the CPU. This avoid issues with manual
printing like with `drw_print`.
Pull Request: https://projects.blender.org/blender/blender/pulls/125071
The math render tests were not passing on the AMD hardware.
This was due to some compiler behavior not returning 1
on the `floor((a - c) / (b - c))` calculation even if
`a` and `b` were equal.
This patch implements a new Gabor noise node based on [1] but with the
improvements from [2] and the phasor formulation from [3].
We compare with the most popular existing implementation, that of OSL,
from the user's point of view:
- This implementation produces C1 continuous noise as opposed to the
non continuous OSL implementation, so it can be used for bump
mapping and is generally smother. This is achieved by windowing the
Gabor kernel using a Hann window.
- The Bandwidth input of OSL was hard-coded to 1 and was replaced with
a frequency input, which OSL hard codes to 2, since frequency is
more natural to control. This is even more true now that that Gabor
kernel is windowed as opposed to truncated, which means increasing
the bandwidth will just turn the Gaussian component of the Gabor
into a Hann window. While decreasing the bandwidth will eliminate
the harmonic from the Gabor kernel, which is the point of Gabor
noise.
- OSL had three discrete modes of operation for orienting the kernel.
Anisotropic, Isotropic, and a hybrid mode. While this implementation
provides a continuous Anisotropy parameter which users are already
familiar with from the Glossy BSDF node.
- This implementation provides not just the Gabor noise value, but
also its phase and intensity components. The Gabor noise value is
basically sin(phase) * intensity, but the phase is arguably more
useful since it does not suffer from the low contrast issues that
Gabor suffers from. While the intensity is useful to hide the
singularities in the phase.
- This implementation converges faster that OSL's relative to the
impulse count, so we fix the impulses count to 8 for simplicitly.
- This implementation does not implement anisotropic filtering.
Future improvements to the node includes implementing surface noise and
filtering. As well as extending the spectral control of the noise,
either by providing specialized kernels as was done in #110802, or by
providing some more procedural control over the frequencies of the
Gabor.
References:
[1]: Lagae, Ares, et al. "Procedural noise using sparse Gabor
convolution." ACM Transactions on Graphics (TOG) 28.3 (2009): 1-10.
[2]: Tavernier, Vincent, et al. "Making gabor noise fast and
normalized." Eurographics 2019-40th Annual Conference of the European
Association for Computer Graphics. 2019.
[3]: Tricard, Thibault, et al. "Procedural phasor noise." ACM
Transactions on Graphics (TOG) 38.4 (2019): 1-13.
Pull Request: https://projects.blender.org/blender/blender/pulls/121820
The direct lights are usually much smoother and with
higher dynamic range than indirect lighting. Using
the R11B11G10 float format exhibit color shifts and
banding even in simple setups without a way to mitigate
the issue.
Using RGB9_E5 encoding improve the quality while retaining
the storage benefit of 32bit formats. The added overhead
of the software encoding not perceptible in a full lighting
pass.
This affects direct lights and SSS convolution result.
Fix#121937
Pull Request: https://projects.blender.org/blender/blender/pulls/122515
The goal of this is to make it easier to add more BSDF
support in the future. Avoids code fragmentation and
allows easy entry points to all algorithms using BSDFs.
Pull Request: https://projects.blender.org/blender/blender/pulls/122255
The Perlin noise algorithms suffer from precision issues when a coordinate
is greater than about 250000.
To fix this the Perlin noise texture is repeated every 100000 on each axis.
This causes discontinuities every 100000, however at such scales this
usually shouldn't be noticeable.
Pull Request: https://projects.blender.org/blender/blender/pulls/119884
This uses Spherical Harmonics to store the indirect lighting and
distant lighting visibility.
We can then reuse this information for each closure which divide
the cost of it by 2 or 3 in many cases, doing the scanning once.
The storage cost is higher than previous method, so we split the
resolution scaling to be independant of raytracing.
The spatial filtering has been split to its own pass for performance
reason. Upsampling now only uses 4 bilinearly interpolated samples
(instead of 9) using bilateral weights to avoid bleeding.
This also add a missing dot product (which soften the lighting
around corners) and fixes the blocky artifacts seen at lower
resolution.
Pull Request: https://projects.blender.org/blender/blender/pulls/118924
The output of the Color Ramp node in the GPU compositor and EEVEE is
slightly off. That's because the factor is evaluated directly at the
sampler without proper half pixel offsets to account for the sampler's
linear interpolation, which this patch adds.
Pull Request: https://projects.blender.org/blender/blender/pulls/117677
The GPU compositor incorrectly extrapolates values of RGBA curves node.
That's because the code introduces a half-pixel offset to the color
values since they will be used to sample the curve maps. Those same
values are then used for extrapolation, which shouldn't take the
half-pixel value into account.
This patch fixes that by computing sampler coordinate in a separate
step.
Pull Request: https://projects.blender.org/blender/blender/pulls/116586
This layout is more flexible and polymorphic.
While the worst case is worse (4 + 3 layers),
the common case is more optimized (2 + 2 layers).
The average written closure data is also lower
since we can compact the data for special cases
which are quite frequent.
Some adjustment had to be made in the denoise an
tile classify shaders.
Pull Request: https://projects.blender.org/blender/blender/pulls/115541
This adds correct object bounds estimation.
This works by creating an occupancy texture where one
bit represents one froxel. A geometry pre-pass fill this
occupancy texture and doesn't do any shading. Each bit
set to 0 will not be considered occupied by the object
volume and will discard the material compute shader for
this froxel.
There is 2 method of computing the occupancy map:
- Atomic XOR: For each fragment we compute the amount of
froxels **center** in-front of it. We then convert that
into occupancy bitmask that we apply to the occupancy
texture using `imageAtomicXor`. This is straight forward
and works well for any manifold geometry.
- Hit List: For each fragment we write the fragment depth
in a list (contained in one array texture). This list
is then processed by a fullscreen pass (see
`eevee_occupancy_convert_frag.glsl`) that sorts and
converts all the hits to the occupancy bits. This
emulate Cycles behavior by considering only back-face
hits as exit events and front-face hits as entry events.
The result stores it to the occupancy texture using
bit-wise `OR` operation to compose it with other non-hit
list objects. This also decouple the hit-list evaluation
complexity from the material evaluation shader.
## Limitations
### Fast
- Non-manifolds geometry objects are rendered incorrectly.
- Non-manifolds geometry objects will affect other objects
in front of them.
### Accurate
- Limited to 16 hits per layer for now.
- Non-manifolds geometry objects will affect other objects
in front of them.
Pull Request: https://projects.blender.org/blender/blender/pulls/113731
Replaces all usage by the the gpu_shader_math
equivalent. This is because the old shader
library was quite tangled.
This avoids dependency hell trying to
mix libraries.
Changes are split into isolated commits until
I had to do mass changes because of inter-
dependencies.
Pull Request: https://projects.blender.org/blender/blender/pulls/113631
This add the possibility to create a
orthogonal basis around a given unit
vector.
The name was chosen to match the naming
convention already in place and match
the other matrix construction functions.
In other places (ex: renderers), this same
function is commonly named `make_orthonormal`
or `make_basis`.
The function is not given to have a fixed
implementation and might change overtime.
That's why the test only covers the
assumptions and not the raw values.
The implementation is borrowed from
Cycles and adapted to our math API.
Pull Request: https://projects.blender.org/blender/blender/pulls/113218
Shadow Map Ray Tracing is a technique that ray cast against the shadow
depth buffer. The technique is described in "Soft Shadows by
Ray Tracing Multilayer Transparent Shadow Maps".
Note that we only implement the single layer approach since storing
multiple depth is prohibitively expensive.
Pull Request: https://projects.blender.org/blender/blender/pulls/111809